简体   繁体   English

熊猫:基于列值合并2个数据框; 对于包含相同列值的多个行,请将其附加到不同的列

[英]Pandas: Merge 2 dataframes based on a column values; for mulitple rows containing same column value, append those to different columns

I have two dataframes, dataframe1 and dataframe2. 我有两个数据框,dataframe1和dataframe2。 They both share the same data in a particular column for both, lets call this column 'share1' and 'share2' for dataframe1 and dataframe2 respectively. 两者在特定列中共享相同的数据,让我们分别为dataframe1和dataframe2将此列称为“ share1”和“ share2”。

The issue is, there are instances where in dataframe1 , there is only one row in 'share1' with a particular value (lets call it 'c34z'), but in dataframe2 there are multiple rows with the value 'c34z' in the 'share2' column. 问题是,在某些情况下,在dataframe1中,“ share1”中只有一行具有特定值(我们称之为“ c34z”),但是在dataframe2中,“ share2”中有多行具有值“ c34z” '列。

What I would like to do is, in the new merged dataframe, when there are new values, I would just like to place them in a new column. 我想做的是,在新的合并数据框中,当有新值时,我只想将它们放在新列中。

So the number of columns in the new dataframe will be the maximum number of duplicates for a particular value in 'share2' . 因此,新数据帧中的列数将是'share2'中特定值的最大重复数。 And for rows where there was only a unique value in 'share2', the rest of the added columns will be blank, for that row. 对于“ share2”中仅有唯一值的行,该行的其余添加列将为空白。

You can using cumcount create the additional key then, pivot df2 您可以使用cumcount创建其他密钥,然后pivot df2

newdf2=df2.assign(key=df2.groupby('share2').cumcount(),v=df2.share2).pivot_table(index='share2',columns='key',values='v',aggfunc='first')

After this ,I am using .loc or reindex concat df2 to df1 在此之后,我正在使用.loc或将concat df2重新reindexdf1

df2=df2.reindex(df1.share1)

df2.index=df1.index
yourdf=pd.concat([df1,df2],axis=1)

Loading Data: 加载数据中:

import pandas as pd
df1 = {'key': ['c34z', 'c34z_2'], 'value': ['x', 'y']}
df2 = {'key': ['c34z', 'c34z_2', 'c34z_2'], 'value': ['c34z_value', 'c34z_2_value', 'c34z_2_value']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

Convert df2 by grouping and pivoting 通过分组和旋转来转换df2

df2_pivot = df2.groupby('key')['value'].apply(lambda df: df.reset_index(drop=True)).unstack().reset_index()

merge df1 and df2_pivot 合并df1和df2_pivot

df_merged = pd.merge(df1, df2_pivot, on='key')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据相同的列名称值在熊猫中合并两个数据框 - Merge two dataframes in pandas based on the same column name values 根据 pandas 中的其他列值合并具有相同列值的行 - Merge row with a same column value based on other column values in pandas 根据这些行中的值创建包含现有列的行的新列 - Create new columns containing the rows of an existing column based on values in those rows 基于相同的列值合并两个数据框 - Merge two dataframes based on same column value Pandas:根据相同的列值但不同的行和条件更新某些列值 - Pandas: Update certain column values based on same column value but different rows and condition 如何为在另一列 pandas 中具有相同值的那些行使一列的值相同 - How to make same value of one column for those rows which have same values in another column pandas 基于 Pandas 中的一列将数据框特定列合并在一起 - Merge dataframes specific columns together based on one column in Pandas 基于具有相同值对但在两个数据框中以不同顺序显示的两列合并熊猫数据框 - Merge pandas data frames based on two columns with the same pair of values but displayed in different orders in the two dataframes 根据 Pandas 中不同行的其他列中的值比较将值应用于列 - Applying values to a column based on value comparison in other columns across different rows in Pandas 在Python 3.x中使用Pandas基于特定列的列和值合并两个DataFrame - Merge two DataFrames based on columns and values of a specific column with Pandas in Python 3.x
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM