Table of Contents
Introduction to Concatenation in Python
So What is Concatenation in python, Data structure of the python is a data frame when you want to join these multiple data frames according to their columns or rows data value for this process merging and concatenation is used. With the help of pandas function we operate this operation on the data frame the functions are pd.merge(),pd.concat().
Merging
Merge multiple data frames using common columns and their keys for this we used pandas.merge function.
There are three different methods of merging
1. On the basis of the Same columns in different data frames.
Here we pass the same column name of two different data frames and merge according to their values in the common column.
Syntax of merge:-
pd.merge(First DataFrame, Second DataFrame, how = ‘Method’, on= ‘Column name’)
Widget not in any sidebars
There are four different ways of merging.
- Left merge
- Right Merge
- Inner Merge
- Outer Merge
Inner Merge
In this merge, we merge two different data frames according to the same value that occurs in data.
Example:- Create Two Different data frames and merge them according to the same columns.
dframe_1 = DataFrame({'key':['x','z','y','z','x','x'],'Dataset1':np.arange(6)})
print('First DataFrame : \n',dframe_1)
dframe_2 = DataFrame({'key':['q','y','z'],'Dataset2':[1,2,3]})
print('\nSecond DataFrame : \n',dframe_2)
merge = pd.merge(dframe_1,dframe_2,on='key',how='inner')
# on keyword is used for select columns Columns,method is inner merge
print("\nMerging of two data frame first and second according to same column('Key') \n",merge)
Output:-
First DataFrame :
key Dataset1
0 x 0
1 z 1
2 y 2
3 z 3
4 x 4
5 x 5
Second DataFrame :
key Dataset2
0 q 1
1 y 2
2 z 3
Merging of two data frame first and second according to same column('Key')
key Dataset1 Dataset2
0 z 1 3
1 z 3 3
2 y 2 2
In the merging output, merge according to the same columns and return the common values in the between two data frames
Left Merge
In the part suppose you want to merge two different data frames according to left merge it gives you the output as getting the all value of left data frame and common value of right data frame.
Code:-
dframe_1 = DataFrame({'key':['x','z','y','z','x','x'],'Dataset1':np.arange(6)})
print('First DataFrame : \n',dframe_1)
dframe_2 = DataFrame({'key':['q','y','z'],'Dataset2':[1,2,3]})
print('\nSecond DataFrame : \n',dframe_2)
merge = pd.merge(dframe_1,dframe_2,on='key',how='left')
# ON keyword is used for select columns Columns # method is inner merge
print("\nMerging of two dataframe first and second according to same column('Key') and method is left :\n”,merge)
Output:-
First DataFrame :
key Dataset1
0 x 0
1 z 1
2 y 2
3 z 3
4 x 4
5 x 5
Second DataFrame :
key Dataset2
0 q 1
1 y 2
2 z 3
Merging of two dataframe first and second according to same column('Key') and method is left merge :
key Dataset1 Dataset2
0 x 0 NaN
1 z 1 3.0
2 y 2 2.0
3 z 3 3.0
4 x 4 NaN
5 x 5 NaN
Here we apply left merge means collect all values from a left data frame (first data frame) and common value of right data frame(second data frame) So the value q is not present in the output data because the value is not present in the left data frame (first data frame).
Widget not in any sidebars
Right merge
In this merge, it gives the output as the all value of the right data frame and common value of the left data frame.
Example:-
dframe_1 = DataFrame({'key':['x','z','y','z','x','x'],'Dataset1':np.arange(6)})
print('First DataFrame : \n',dframe_1)
dframe_2 = DataFrame({'key':['q','y','z'],'Dataset2':[1,2,3]})
print('\nSecond DataFrame : \n',dframe_2)
merge = pd.merge(dframe_1,dframe_2,on='key',how='right')
# ON keyword is used for select columns Columns # method is inner merge
print("\nMerging of two data frame first and second according to same column('Key') and method is right merge :\n",merge)
Output:-
First DataFrame :
key Dataset1
0 x 0
1 z 1
2 y 2
3 z 3
4 x 4
5 x 5
Second DataFrame :
key Dataset2
0 q 1
1 y 2
2 z 3
Merging of two data frame first and second according to same column('Key') and method is right merge :
key Dataset1 Dataset2
0 z 1.0 3
1 z 3.0 3
2 y 2.0 2
3 q NaN 1
Here we apply right merge so collect all values from a right data frame (second data frame) and common value of left data frame(first data frame) So the value x is not present in the output data because the value is not present in the right data frame (second data frame).
4. Outer Merge
The output of the outer merge is to collect all value from both of the data frames left and right data frame.
Example:-
dframe_1 = DataFrame({'key':['x','z','y','z','x','x'],'Dataset1':np.arange(6)})
print('First DataFrame : \n',dframe_1)
dframe_2 = DataFrame({'key':['q','y','z'],'Dataset2':[1,2,3]})
print('\nSecond DataFrame : \n',dframe_2)
merge = pd.merge(dframe_1,dframe_2,on='key',how='outer')
# on keyword is used for select columns Columns # method is inner merge
print("\nMerging of two dataframe first and second according to same column('Key') and method is outer merge :\n"
,merge)
Output:-
First DataFrame :
key Dataset1
0 x 0
1 z 1
2 y 2
3 z 3
4 x 4
5 x 5
Second DataFrame :
key Dataset2
0 q 1
1 y 2
2 z 3
Merging of two dataframe first and second according to same column('Key') and method is outer merge :
key Dataset1 Dataset2
0 x 0.0 NaN
1 x 4.0 NaN
2 x 5.0 NaN
3 z 1.0 3.0
4 z 3.0 3.0
5 y 2.0 2.0
6 q NaN 1.0
Here all value present in the output of merge
2. On the basis of the column value and index of the data frame
In this part, we select a column of one data frame and index of another data frame and apply merge to merge on the basis of the column and index of the data frame.
Syntex:- pd.merge(left , right, how = ‘Method’, left_on = ‘Column name’,right_on =’Column name’, left_index=True,right_index=True’)
Following are parameters:-
First DataFrame = Name of data frame object.
Second DataFrame = Name of second data frame object.
how = Methods of merging ( left/ inner/ right/ outer )
left_on = To select left data frame columns, here we pass column name
right_on = To select right data frame columns, here we pass column name
Left_index = To select left data frame index, here we pass True
Right_index = To select right data frame index, here we pass True
Given data frames to perform merging.
Code:-
df_left = DataFrame({'key': list('xxyyz'),'data': np.arange(5)})
print('First DataFrame : \n',df_left)
df_right = DataFrame({'group_data':[10,20]}, index = ['x','x'])
print('\nSecond DataFrame : \n',df_right)
Output:-
First DataFrame :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
Second DataFrame :
group_data
x 10
x 20
Example 1:- Select left data frame column and right data frame index and method is inner merge.
Note:- Use above given data frame for merging
Code:-
merge = pd.merge(df_left,df_right,left_on='key',right_index=True,how='inner')
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to column and index of data frame method is outer merge : \n\n", merge)
Output:-
Merging of two data frame first and second according to values of column and index of data frame method is outer merge :
key data group_data
0 x 0 10
0 x 0 20
1 x 1 10
1 x 1 20
Merge output contains all x values in key columns because we apply the method inner and so x is a common value in the column and index of the data frame.
Example 2:- Select the left data frame column and right data frame index and the method is the outer merge.
Note:- Use above given data frame for merging
Code:-
merge = pd.merge(df_left,df_right,left_on='key',right_index=True,how='outer')
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to values of column and index of data frame method is outer merge : \n\n"
, merge)
Output:-
Merging of two data frame first and second according to values of column and index of data frame method is outer merge :
key data group_data
0 x 0 10.0
0 x 0 20.0
1 x 1 10.0
1 x 1 20.0
2 y 2 NaN
3 y 3 NaN
4 z 4 NaN
Here We apply the Outer merging it shows all values of both column and index of data.
Example 3:-
Note:- Use above given data frame for merging
Code:-
merge = pd.merge(df_left,df_right,left_on='key',right_index=True,how='left')
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to values of column and index of data frame method is outer merge : \n\n", merge)
Output:-
Merging of two dataframe first and second according to values of column and index of data method is outer merge :
key data group_data
0 x 0 10.0
0 x 0 20.0
1 x 1 10.0
1 x 1 20.0
2 y 2 NaN
3 y 3 NaN
4 z 4 NaN
Here we apply the left method to merge on the basis of left column value and common value of the right data frame index.
Example 4:-
Note:- Use above given data frame for merging
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to values of column and index of data frame method is outer merge : \n\n", merge)
Output:-
Merging of two data frame first and second according to values of column and index of data frame method is outer merge :
key data group_data
0 x 0 10
1 x 1 10
0 x 0 20
1 x 1 20
Here we apply the method right merge so they take all index values of the right data frame and common values of the left data frame column.
3. Merging on the basis of the index of two different data frames.
Here we select the index of both of the data frames to merge. On the basis of their same value in index.
Given Data frame are as follows
print('First DataFrame : \n',df_left)
df_right = DataFrame({'group_data_1':[10,20,30]}, index = ['x','x','y'])
print('\nSecond DataFrame : \n',df_right)
Output:-
First DataFrame :
group_data
x 10
x 20
Second DataFrame :
group_data_1
x 10
x 20
y 30
Example 1:- Apply inner merge on the basis of the index of columns.
Note:- Use above given data frame for merging
Code:-
merge = pd.merge(df_left,df_right,left_index=True,right_index=True,how='outer')
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to values of index and index of data frame method is outer merge : \n\n", merge)
Output:-
Merging of two dataframe first and second according to values of index and index of data frame method is outer merge :
group_data group_data_1
x 10.0 10
x 10.0 20
x 20.0 10
x 20.0 20
y NaN 30
Example 2:- Apply right merge on the basis of the index of columns.
Note:- Use above given data frame for merging
merge = pd.merge(df_left,df_right,left_index=True,right_index=True,how='right')
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to values of index and index of data frame method is right merge : \n\n", merge)
Output:-
Merging of two dataframe first and second according to values of index and index of data frame method is right merge :
group_data group_data_1
x 10.0 10
x 20.0 10
x 10.0 20
x 20.0 20
y NaN 30
merge = pd.merge(df_left,df_right,left_index=True,right_index=True,how='left')
# left_on = For selecting the columns of left dataframe
# right_index= For selecting the index of right dataframe
# how = method of merging
print("\nMerging of two data frame first and second according to values of index and index of data frame method is left merge : \n\n", merge)
Output:-
Merging of two data frame first and second according to values of index and index of data frame method is outer merge :
group_data group_data_1
x 10 10
x 10 20
x 20 10
x 20 20
Join in Data Frame
If you want to join a data frame according to their columns of data use .join function.and also apply the method (inner / Outer )
Example1:-Join First Data Frame to another Data frame using .join function.
Code:-
df_1 = DataFrame({'key': list('xxyyz'),'data': np.arange(5)})
print('First DataFrame : \n',df_1)
df_2 = DataFrame({'para': list('ABCDEF'),'values': np.arange(10,16)})
print('First DataFrame : \n',df_2)
result = df_1.join(df_2)
print('\n Join data frame :- \n\n',result)
Output:-
First DataFrame :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
First DataFrame :
para values
0 A 10
1 B 11
2 C 12
3 D 13
4 E 14
5 F 15
Join data frame:-
key data para values
0 x 0 A 10
1 x 1 B 11
2 y 2 C 12
3 y 3 D 13
4 z 4 E 14
here join on the basis of the same shape of the data value in the first data frame.
Example:- Apply method outer joining in the join function.
Code:-
df_1 = DataFrame({'key': list('xxyyz'),'data': np.arange(5)})
print('First DataFrame : \n',df_1)
df_2 = DataFrame({'para': list('ABCDEF'),'values': np.arange(10,16)})
print('First DataFrame : \n',df_2)
result1 = df_1.join(df_2,how='outer')
print('\n Join using Outer method :- \n',result1)
Output:-
First DataFrame :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
First DataFrame :
para values
0 A 10
1 B 11
2 C 12
3 D 13
4 E 14
5 F 15
Join using Outer method :-
key data para values
0 x 0.0 A 10
1 x 1.0 B 11
2 y 2.0 C 12
3 y 3.0 D 13
4 z 4.0 E 14
5 NaN NaN F 15
The value is NaN means that data does not contain in a first data frame
Example:- Apply to join select according to common columns values use on keyword to select column in join function
Code:-
df_1 = DataFrame({'key': list('xxyyz'),'data': np.arange(5)})
print('First DataFrame : \n',df_1)
df_2 = DataFrame({'para': list('ABCDEF'),'values': np.arange(10,16)},index=list('xxyyzw'))
print('First DataFrame : \n',df_2)
Result = df_1.join(df_2,on='key')
print('\n join on the basis of key column value of data :\n\n ',Result)
Output:-
First DataFrame :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
First DataFrame :
para values
x A 10
x B 11
y C 12
y D 13
z E 14
w F 15
join on the basis of key column value of data :
key data para values
0 x 0 A 10
0 x 0 B 11
1 x 1 A 10
1 x 1 B 11
2 y 2 C 12
2 y 2 D 13
3 y 3 C 12
3 y 3 D 13
4 z 4 E 14
Concatenation in Python
Concatenation in python is used for the join to a different data frame according to rows and columns.
Syntax to concat two data frame:-
pd.concat( fist data frame, second data frame, axis = 0/1)
Parameters:-
axis = 0 -> concat data into the row
axis = 1 -> concat data into columns
Example 1:- Concat two data frames according to row-wise.
Code:-
df_1 = DataFrame({'key': list('xxyyz'),'data': np.arange(5)})
print('First DataFrame : \n',df_1)
df_2 = DataFrame({'key': list('ABCDE'),'data': np.arange(10,15)})
print('First DataFrame : \n',df_2)
concat_data = pd.concat([df_1,df_2],axis=0)
print("\nConcat two dataframe according to row wise : \n\n",concat_data)
Output:-
First DataFrame :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
First DataFrame :
key data
0 A 10
1 B 11
2 C 12
3 D 13
4 E 14
Concat two data frame according to row-wise :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
0 A 10
1 B 11
2 C 12
3 D 13
4 E 14
Example 2:- Concat two data frames according to column-wise.
Note:- In this column Concatenation in python the number of rows of both data frames is needed to be the same.
Code:-
df_1 = DataFrame({'key': list('xxyyz'),'data': np.arange(5)})
print('First DataFrame : \n',df_1)
df_2 = DataFrame({'para': list('ABCDE'),'values': np.arange(10,15)})
print('First DataFrame : \n',df_2)
concat_data = pd.concat([df_1,df_2],axis=1)
print("\n Concat two dataframe according to row wise : \n\n",concat_data)
Output:-
First DataFrame :
key data
0 x 0
1 x 1
2 y 2
3 y 3
4 z 4
First DataFrame :
para values
0 A 10
1 B 11
2 C 12
3 D 13
4 E 14
Concat two data frame according to column-wise :
key data para values
0 x 0 A 10
1 x 1 B 11
2 y 2 C 12
3 y 3 D 13
4 z 4 E 14
Conclusion
In this blog, you will get a better understanding of how to merge/join multiple data frames according to their columns and the index of the data frame and the knowledge of types of merging and how to do Concatenation in python two different data frames.