[英]Create a new column's dataframe by matching another dataframe many to one relationship
I am pretty new using pandas library and I am not used to dataframe yet. 我是使用熊猫库的新手,而且还不熟悉数据框。 I am trying to add a column to a dataframe1 by using a value of a column from dataframe1, use this value as an index of dataframe2, and get the corresponding value
我正在尝试通过使用dataframe1中列的值将列添加到dataframe1中,将此值用作dataframe2的索引,并获取相应的值
I have two dataframes: 我有两个数据框:
df1 = pandas.DataFrame({'customer' : pd.Series([28, 28, 29, 30],
index=['0', '1', '3', '4']),
'store' : pd.Series([14, 14, 14, 22],
index=['0', '1', '3', '4'])})
df2 = pandas.DataFrame({'value': pd.Series([6, 7, 8],
index=[0, 1, 2]),
'store': pd.Series([14, 14, 22],
index=[0,1, 2])})
df2.groupby(['store']).agg({'Value':[sum]})
My goal is to add in df1 a column containing the 'values' in the index corresponding to the value df2 when 'store' values of df1 我的目标是在df1中添加一个列,该列在索引中包含与df2值相对应的索引中的“值”
Expecting output: 预期输出:
df3 = {'customer' : pd.Series([28., 28., 29., 30.], index=['0', '1', '3', '4']),
'store' : pd.Series([14, 14, 14, 22], index=['0', '1', '3', '4']),
'value' : pd.Series([6, 6, 6, 8], index=['0', '1', '3', '4']}
I tried: 我试过了:
for index, row in df1.iterrows():
df1['Values'] = df2.loc[row['store']]
But I get the TypeError: incompatible index of inserted column with frame index 但是我得到TypeError:与框架索引不兼容的插入列索引
for index, row in df1.iterrows():
df1['Values'] = df2.loc[pd.Index(row['store'])]
But I get a TypeError: 但是我得到一个TypeError:
Index(...) must be called with a collection of some kind, 'int' was passed
Thank you very much for your help, I am really struggling on that 非常感谢您的帮助,我真的很努力
Let's change your groupby statement to create a pd.Series
and use map
: 让我们更改groupby语句以创建
pd.Series
并使用map
:
s = df2.groupby(['store'])['value'].agg('sum')
df1['value'] = df1['store'].map(s)
df1
Output: 输出:
customer store value
0 28 14 13
1 28 14 13
3 29 14 13
4 30 22 8
在将df2汇总为唯一store
值之后,这对我有用:
df1['value'] = [int(df2[df2.store==s].value) for s in df1.store]
You simply need: 您只需要:
df1.merge(df2.reset_index(), how='left', on=['store'])
Output: 输出:
customer store value
0 28 14 13
1 28 14 13
2 29 14 13
3 30 22 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.