[英]Pandas: Merge 2 dataframes based on a column values; for mulitple rows containing same column value, append those to different columns
I have two dataframes, dataframe1 and dataframe2. 我有两个数据框,dataframe1和dataframe2。 They both share the same data in a particular column for both, lets call this column 'share1' and 'share2' for dataframe1 and dataframe2 respectively.
两者在特定列中共享相同的数据,让我们分别为dataframe1和dataframe2将此列称为“ share1”和“ share2”。
The issue is, there are instances where in dataframe1 , there is only one row in 'share1' with a particular value (lets call it 'c34z'), but in dataframe2 there are multiple rows with the value 'c34z' in the 'share2' column. 问题是,在某些情况下,在dataframe1中,“ share1”中只有一行具有特定值(我们称之为“ c34z”),但是在dataframe2中,“ share2”中有多行具有值“ c34z” '列。
What I would like to do is, in the new merged dataframe, when there are new values, I would just like to place them in a new column. 我想做的是,在新的合并数据框中,当有新值时,我只想将它们放在新列中。
So the number of columns in the new dataframe will be the maximum number of duplicates for a particular value in 'share2' . 因此,新数据帧中的列数将是'share2'中特定值的最大重复数。 And for rows where there was only a unique value in 'share2', the rest of the added columns will be blank, for that row.
对于“ share2”中仅有唯一值的行,该行的其余添加列将为空白。
You can using cumcount
create the additional key then, pivot
df2 您可以使用
cumcount
创建其他密钥,然后pivot
df2
newdf2=df2.assign(key=df2.groupby('share2').cumcount(),v=df2.share2).pivot_table(index='share2',columns='key',values='v',aggfunc='first')
After this ,I am using .loc
or reindex
concat
df2
to df1
在此之后,我正在使用
.loc
或将concat
df2
重新reindex
为df1
df2=df2.reindex(df1.share1)
df2.index=df1.index
yourdf=pd.concat([df1,df2],axis=1)
Loading Data: 加载数据中:
import pandas as pd
df1 = {'key': ['c34z', 'c34z_2'], 'value': ['x', 'y']}
df2 = {'key': ['c34z', 'c34z_2', 'c34z_2'], 'value': ['c34z_value', 'c34z_2_value', 'c34z_2_value']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
Convert df2 by grouping and pivoting 通过分组和旋转来转换df2
df2_pivot = df2.groupby('key')['value'].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()
merge df1 and df2_pivot 合并df1和df2_pivot
df_merged = pd.merge(df1, df2_pivot, on='key')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.