简体   繁体   中英

Pandas: Merge 2 dataframes based on a column values; for mulitple rows containing same column value, append those to different columns

I have two dataframes, dataframe1 and dataframe2. They both share the same data in a particular column for both, lets call this column 'share1' and 'share2' for dataframe1 and dataframe2 respectively.

The issue is, there are instances where in dataframe1 , there is only one row in 'share1' with a particular value (lets call it 'c34z'), but in dataframe2 there are multiple rows with the value 'c34z' in the 'share2' column.

What I would like to do is, in the new merged dataframe, when there are new values, I would just like to place them in a new column.

So the number of columns in the new dataframe will be the maximum number of duplicates for a particular value in 'share2' . And for rows where there was only a unique value in 'share2', the rest of the added columns will be blank, for that row.

You can using cumcount create the additional key then, pivot df2

newdf2=df2.assign(key=df2.groupby('share2').cumcount(),v=df2.share2).pivot_table(index='share2',columns='key',values='v',aggfunc='first')

After this ,I am using .loc or reindex concat df2 to df1

df2=df2.reindex(df1.share1)

df2.index=df1.index
yourdf=pd.concat([df1,df2],axis=1)

Loading Data:

import pandas as pd
df1 = {'key': ['c34z', 'c34z_2'], 'value': ['x', 'y']}
df2 = {'key': ['c34z', 'c34z_2', 'c34z_2'], 'value': ['c34z_value', 'c34z_2_value', 'c34z_2_value']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

Convert df2 by grouping and pivoting

df2_pivot = df2.groupby('key')['value'].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()

merge df1 and df2_pivot

df_merged = pd.merge(df1, df2_pivot, on='key')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM