熊猫：基于列值合并2个数据框；对于包含相同列值的多个行，请将其附加到不同的列

Question

I have two dataframes, dataframe1 and dataframe2. 我有两个数据框，dataframe1和dataframe2。 They both share the same data in a particular column for both, lets call this column 'share1' and 'share2' for dataframe1 and dataframe2 respectively. 两者在特定列中共享相同的数据，让我们分别为dataframe1和dataframe2将此列称为“ share1”和“ share2”。

The issue is, there are instances where in dataframe1 , there is only one row in 'share1' with a particular value (lets call it 'c34z'), but in dataframe2 there are multiple rows with the value 'c34z' in the 'share2' column. 问题是，在某些情况下，在dataframe1中，“ share1”中只有一行具有特定值（我们称之为“ c34z”），但是在dataframe2中，“ share2”中有多行具有值“ c34z” '列。

What I would like to do is, in the new merged dataframe, when there are new values, I would just like to place them in a new column. 我想做的是，在新的合并数据框中，当有新值时，我只想将它们放在新列中。

So the number of columns in the new dataframe will be the maximum number of duplicates for a particular value in 'share2' . 因此，新数据帧中的列数将是'share2'中特定值的最大重复数。 And for rows where there was only a unique value in 'share2', the rest of the added columns will be blank, for that row. 对于“ share2”中仅有唯一值的行，该行的其余添加列将为空白。

Answer 1

You can using cumcount create the additional key then, pivot df2 您可以使用cumcount创建其他密钥，然后pivot df2

newdf2=df2.assign(key=df2.groupby('share2').cumcount(),v=df2.share2).pivot_table(index='share2',columns='key',values='v',aggfunc='first')

After this ,I am using .loc or reindex concat df2 to df1 在此之后，我正在使用.loc或将concat df2重新reindex为df1

df2=df2.reindex(df1.share1)

df2.index=df1.index
yourdf=pd.concat([df1,df2],axis=1)

Answer 2

Loading Data: 加载数据中：

import pandas as pd
df1 = {'key': ['c34z', 'c34z_2'], 'value': ['x', 'y']}
df2 = {'key': ['c34z', 'c34z_2', 'c34z_2'], 'value': ['c34z_value', 'c34z_2_value', 'c34z_2_value']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

Convert df2 by grouping and pivoting 通过分组和旋转来转换df2

df2_pivot = df2.groupby('key')['value'].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()

merge df1 and df2_pivot 合并df1和df2_pivot

df_merged = pd.merge(df1, df2_pivot, on='key')

熊猫：基于列值合并2个数据框；对于包含相同列值的多个行，请将其附加到不同的列

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-04-21 01:30:10

解决方案2
1 2019-04-21 01:33:59

熊猫：基于列值合并2个数据框； 对于包含相同列值的多个行，请将其附加到不同的列

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-04-21 01:30:10

解决方案2 1 2019-04-21 01:33:59

熊猫：基于列值合并2个数据框；对于包含相同列值的多个行，请将其附加到不同的列

解决方案1
1 已采纳 2019-04-21 01:30:10

解决方案2
1 2019-04-21 01:33:59