Python dataframe inner join

pandas.DataFrame.join#

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters other DataFrame, Series, or a list containing any combination of them

Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.

on str, list of str, or array-like, optional

Column or index level name(s) in the caller to join on the index in other , otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.

Читайте также:  Пример html страницы на php

How to handle the operation of the two objects.

  • left: use calling frame’s index (or column if on is specified)
  • right: use other ’s index.
  • outer: form union of calling frame’s index (or column if on is specified) with other ’s index, and sort it. lexicographically.
  • inner: form intersection of calling frame’s index (or column if on is specified) with other ’s index, preserving the order of the calling’s one.
  • cross: creates the cartesian product from both frames, preserves the order of the left keys.

Suffix to use from left frame’s overlapping columns.

rsuffix str, default ‘’

Suffix to use from right frame’s overlapping columns.

sort bool, default False

Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).

validate str, optional

If specified, checks if join is of specified type. * “one_to_one” or “1:1”: check if join keys are unique in both left and right datasets. * “one_to_many” or “1:m”: check if join keys are unique in left dataset. * “many_to_one” or “m:1”: check if join keys are unique in right dataset. * “many_to_many” or “m:m”: allowed, but does not result in checks. .. versionadded:: 1.5.0

A dataframe containing columns from both the caller and other .

For column(s)-on-column(s) operations.

Parameters on , lsuffix , and rsuffix are not supported when passing a list of DataFrame objects.

Support for specifying index levels as the on parameter was added in version 0.23.0.

>>> df = pd.DataFrame('key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'], . 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']>) 
>>> df key A 0 K0 A0 1 K1 A1 2 K2 A2 3 K3 A3 4 K4 A4 5 K5 A5 
>>> other = pd.DataFrame('key': ['K0', 'K1', 'K2'], . 'B': ['B0', 'B1', 'B2']>) 
>>> other key B 0 K0 B0 1 K1 B1 2 K2 B2 

Join DataFrames using their indexes.

>>> df.join(other, lsuffix='_caller', rsuffix='_other') key_caller A key_other B 0 K0 A0 K0 B0 1 K1 A1 K1 B1 2 K2 A2 K2 B2 3 K3 A3 NaN NaN 4 K4 A4 NaN NaN 5 K5 A5 NaN NaN 

If we want to join using the key columns, we need to set key to be the index in both df and other . The joined DataFrame will have key as its index.

>>> df.set_index('key').join(other.set_index('key')) A B key K0 A0 B0 K1 A1 B1 K2 A2 B2 K3 A3 NaN K4 A4 NaN K5 A5 NaN 

Another option to join using the key columns is to use the on parameter. DataFrame.join always uses other ’s index but we can use any column in df . This method preserves the original DataFrame’s index in the result.

>>> df.join(other.set_index('key'), on='key') key A B 0 K0 A0 B0 1 K1 A1 B1 2 K2 A2 B2 3 K3 A3 NaN 4 K4 A4 NaN 5 K5 A5 NaN 

Using non-unique key values shows how they are matched.

>>> df = pd.DataFrame('key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'], . 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']>) 
>>> df key A 0 K0 A0 1 K1 A1 2 K1 A2 3 K3 A3 4 K0 A4 5 K1 A5 
>>> df.join(other.set_index('key'), on='key', validate='m:1') key A B 0 K0 A0 B0 1 K1 A1 B1 2 K1 A2 B1 3 K3 A3 NaN 4 K0 A4 B0 5 K1 A5 B1 

Источник

Inner Join DataFrames in Python

The inner join operation is used in database management to join two or more tables. We can also perform inner join operations on two pandas dataframes as they contain tabular values. In this article, we will discuss how we can perform an inner join operation on two dataframes in python.

What is Inner Join Operation?

The inner join operation is used to find the intersection between two tables. For instance, consider that we have a table that contains the personal details of students and another table that contains grades of the students. If both of the tables have a common column, say ‘Name’ , then, we can create another table that has the details of the students as well as their marks in each row.

To perform the inner join operation in python, we can use the pandas dataframes along with the join() method or the merge() method. Let us discuss them one by one.

The files used in the programs can be downloaded using the below links.

Inner Join Two DataFrames Using the merge() Method

We can use the merge() method to perform inner join operation on two dataframes in python. The merge() method, when invoked on a dataframe, takes another dataframe as its first input argument. Along with that, it takes the value ‘inner’ as an input argument for the ‘how’ parameter. It also takes the column name that is common between the two dataframes as the input argument for the ‘on’ parameter. After execution, it returns a dataframe which is the intersection of both the dataframes and contains columns from both the dataframes. You can observe this in the following example.

import pandas as pd import numpy as np names=pd.read_csv("name.csv") grades=pd.read_csv("grade.csv") resultdf=names.merge(grades,how="inner",on="Name") print("The resultant dataframe is:") print(resultdf)
The resultant dataframe is: Class_x Roll_x Name Class_y Roll_y Grade 0 1 11 Aditya 1 11 A 1 1 12 Chris 1 12 A+ 2 2 1 Joel 2 1 B 3 2 22 Tom 2 22 B+ 4 3 33 Tina 3 33 A- 5 3 34 Amy 3 34 A

You should keep in mind that the output dataframe will only contain those rows from both the tables in which the column given as input to the ‘on’ parameter is the same. All the other rows from both the dataframes will be omitted from the output dataframe.

If there are columns with the same name, the python interpreter adds _x and _y suffixes to the column names. To identify the columns from the dataframe on which the merge() method in invoked, _x suffix is added. For the dataframe that is passed as the input argument to the merge() method, _y suffix is used.

Suggested Reading: If you are into machine learning, you can read this article on regression in machine learning. You might also like this article on k-means clustering with numerical example.

Inner Join Two DataFrames Using the join() Method

Instead of using the merge() method, we can use the join() method to perform the inner join operation on the dataframes.

The join() method, when invoked on a dataframe, takes another dataframe as its first input argument. Along with that, it takes the value ‘inner’ as an input argument for the ‘how’ parameter. It also takes the column name that is common between the two dataframes as the input argument for the ‘on’ parameter. After execution, the join() method returns the output dataframe as shown below.

import pandas as pd import numpy as np names=pd.read_csv("name.csv") grades=pd.read_csv("grade.csv") grades=grades.set_index("Name") resultdf=names.join(grades,how="inner",on="Name",lsuffix='_names', rsuffix='_grades') print("The resultant dataframe is:") print(resultdf)
The resultant dataframe is: Class_names Roll_names Name Class_grades Roll_grades Grade 0 1 11 Aditya 1 11 A 1 1 12 Chris 1 12 A+ 3 2 1 Joel 2 1 B 4 2 22 Tom 2 22 B+ 6 3 33 Tina 3 33 A- 7 3 34 Amy 3 34 A

While using the join() method, you also need to keep in mind that the column on which the join operation is to be performed should be the index of the dataframe that is passed as input argument to the join() method. If the dataframes have same column names for some columns, you need to specify the suffix for column names using the lsuffix and rsuffix parameters. The values passed to these parameters help us identify which column comes from which dataframe if the column names are the same.

Conclusion

In this article, we have discussed two ways to perform an inner join operation on two dataframes in python. To know more about python programming, you can read this article on dictionary comprehension in python. You might also like this article on list comprehension in python.

Course: Python 3 For Beginners

Over 15 hours of video content with guided instruction for beginners. Learn how to create real world applications and master the basics.

Источник

Join in Pandas: Merge data frames (inner, outer, right, left join) in pandas python

We can Join or merge two data frames in pandas python by using the merge() function. The different arguments to merge() allow you to perform natural join, left join, right join, and full outer join in pandas. We have also seen other type join or concatenate operations like join based on index,Row index and column index.

Join or Merge in Pandas – Syntax:


left_df
– Dataframe1
right_df– Dataframe2.
on− Columns (names) to join on. Must be found in both the left and right DataFrame objects.
how – type of join needs to be performed – ‘left’, ‘right’, ‘outer’, ‘inner’, Default is inner join

The data frames must have same column names on which the merging happens. Merge() Function in pandas is similar to database join operation in SQL.

UNDERSTANDING THE DIFFERENT TYPES OF JOIN OR MERGE IN PANDAS:

  • Inner Join or Natural join: To keep only rows that match from the data frames, specify the argument how= ‘inner’.
  • Outer Join or Full outer join:To keep all rows from both data frames, specify how= ‘outer’.
  • Left Join or Left outer join:To include all the rows of your data frame x and only those from y that match, specify how= ‘left’.
  • Right Join or Right outer join:To include all the rows of your data frame y and only those from x that match, specify how= ‘right’.

join or merge in python pandas 1

Lets try different Merge or join operation with an example:

Create dataframe:

import pandas as pd import numpy as np # data frame 1 d1 = df1 = pd.DataFrame(d1) # data frame 2 d2 = df2 = pd.DataFrame(d2)

so we will get following two data frames

Join in Pandas Merge data frames inner outer right left join in pandas python 1

df1:

df2:

Join in Pandas Merge data frames inner outer right left join in pandas python 2

Inner join pandas:

Join in Pandas Merge data frames inner outer right left join in pandas python 3

Return only the rows in which the left table have matching keys in the right table

#inner join in python pandas inner_join_df= pd.merge(df1, df2, on='Customer_id', how='inner') inner_join_df

the resultant data frame df will be

Join in Pandas Merge data frames inner outer right left join in pandas python 4

Outer join in pandas:

Join in Pandas Merge data frames inner outer right left join in pandas python 5

Returns all rows from both tables, join records from the left which have matching keys in the right table.When there is no Matching from any table NaN will be returned

# outer join in python pandas outer_join_df=pd.merge(df1, df2, on='Customer_id', how='outer') outer_join_df

the resultant data frame df will be

Join in Pandas Merge data frames inner outer right left join in pandas python 6

Left outer Join or Left join pandas:

Join in Pandas Merge data frames inner outer right left join in pandas python 7

Return all rows from the left table, and any rows with matching keys from the right table.When there is no Matching from right table NaN will be returned

# left join in python left_join_df= pd.merge(df1, df2, on='Customer_id', how='left') left_join_df

the resultant data frame df will be

Join in Pandas Merge data frames inner outer right left join in pandas python 8

Right outer join or Right Join pandas:

Join in Pandas Merge data frames inner outer right left join in pandas python 9

Return all rows from the right table, and any rows with matching keys from the left table.

# right join in python pandas right_join_df= pd.merge(df1, df2, on='Customer_id', how='right') right_join_df

the resultant data frame df will be

Join in Pandas Merge data frames inner outer right left join in pandas python 10

OTHER TYPES OF JOINS & CONCATENATION IN PANDAS PYTHON

Join based on Index in pandas python (Row index):

Simply concatenated both the tables based on their index.

# join based on index python pandas df_index = pd.merge(df1, df2, right_index=True, left_index=True) df_index

the resultant data frame will be

Join in Pandas Merge data frames inner outer right left join in pandas python 11

Concatenate or join on Index in pandas python and keep the same index:

Concatenates two tables and keeps the old index .

# Concatenate and keep the old index python pandas df_row = pd.concat([df1, df2]) df_row

the resultant data frame will be

Join in Pandas Merge data frames inner outer right left join in pandas python 12

Concatenate or join on Index in pandas python and change the index:

Concatenates two tables and change the index by reindexing.

# Concatenate and change the index python pandas df_row_reindex = pd.concat([df1, df2], ignore_index=True) df_row_reindex

the resultant data frame will be

Join in Pandas Merge data frames inner outer right left join in pandas python 13

Concatenate or join based on column index in pandas python:

Simply concatenated both the tables based on their column index. Axis =1 indicates concatenation has to be done based on column index

# join based on index python pandas df_col = pd.concat([df1,df2], axis=1) df_col

the resultant data frame will be

Join in Pandas Merge data frames inner outer right left join in pandas python 14

Author

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark. View all posts

Источник

Оцените статью