简体   繁体   中英

Pandas - merge two DataFrames with Identical Column Names and combine information of two DataFrames in one cell

I have two Data Frames with identical column names and identical IDs in the first column. In the first Data Frame I have int information and in the second - str.

Here's an example of what they look like:

ID    Cat1    Cat2    Cat3  
1     1        1       0 
2     0        2       1 
3     0        0       5


ID    Cat1    Cat2    Cat3 
1     text    text    text 
2     text    text    text
3     text    text    text

I want to merge them into one DataFrame and combine information of two Data Frames into the same cells. So the result would look like this:

ID    Cat1      Cat2         Cat3  
1    1, text   1, text     0, text 
2    0, text   2, text     1, text  
3    0, text   0, text     5, text

I tried use pandas.combine, but it didn't work properly.

Is it possible to solve this task?

filter out the columns to be merged; add ', ' and convert relevant columns from int to string. finally concat back to df.ID on the columns axis

Merged_Dfs = (df.filter(like='Cat').astype(str)
             .add(', ')
             .add(df1.filter(like='Cat').astype(str)))

pd.concat([df.ID,
           Merged_Dfs
           ],axis=1)

    ID  Cat1    Cat2    Cat3
0   1   1, text 1, text 0, text
1   2   0, text 2, text 1, text
2   3   0, text 0, text 5, text

Alternatively, you can use pandas insert to hook back df.ID to Merged Dfs as the first column

Merged_Dfs.insert(0,'ID',df.ID)

print(Merged_Dfs)

You can use combine to join the two dataframes using pd.Series.str.cat to join the elements of each dataframe:

df1.set_index('ID').astype(str).combine(df2.set_index('ID'), lambda x,y: x.str.cat(y, sep=', '))

This requires setting the index as ID and having the numerics as strings.

Output:

       Cat1     Cat2     Cat3
ID                           
1   1, text  1, text  0, text
2   0, text  2, text  1, text
3   0, text  0, text  5, text

You can use pandas.DataFrame.conbine to merge two data frames. However, you need to pass the correct function to attribute func .


merge = lambda x,y: [x,y]
df1.combine(df2, func = lambda s1,s2: s1.combine(s2, func = merge))

Note that the variable of this function is pandas.Series . Thus, pandas.Series.combine is called to get the correct result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM