[英]Pandas divide entries of a column by entries from another data frame
I have 2 dataframes - A and B. A contains weekly sales data for various stores, departments indexed by a key Store_Dept_Date
(eg. 2_12_2010-04-03
)while B contains corresponding Consumer Price Index (CPI) for given store and date indexed as Store_Date
for eg 2_2010-04-03
. 我有2个数据框-A和B。A包含各商店的每周销售数据,这些部门由键Store_Dept_Date
索引(例如2_12_2010-04-03
),而B包含给定商店的相应消费者价格指数(CPI),索引的日期为Store_Date
,例如2_2010-04-03
。
> A.columns
> Out [ ] : Index([u'Store', u'Dept', u'Date', u'Weekly_Sales'], dtype='object')
> B.columns
> Out [ ] : Index([u'Store', u'Date', u'CPI'], dtype='object')
I want to normalize the weekly sales given in A by dividing each row of A by corresponding CPI value given in B. 我想通过将A的每一行除以B中的相应CPI值来归一化A中的每周销售额。
Currently I am trying this: 目前,我正在尝试:
for ix,row in A.iterrows():
f_index = str(row['Store']) + "_" + row['Date']
A.ix[ix,'Weekly_Sales'] = row['Weekly_Sales']/ B.ix[f_index,'CPI']
A contains 421570 rows. A包含421570行。 My program takes forever to run. 我的程序永远运行。 Whats the correct and efficient way of doing it? 什么是正确有效的方法?
The DataFrames' merge method should be faster even though it copies data. 即使DataFrames的merge方法复制数据,也应更快。 You can set the flag copy=False
to minimize unnecessary copying. 您可以将标志copy=False
为最小化不必要的复制。
If there is one date in B for every date in A, then you can do: 如果B中的每个日期都有一个日期,那么您可以执行以下操作:
C = A.merge(B, on=['Store', 'Date'], copy=False)
C['Normalized_Sales'] = C.Weekly_Sales / C.CPI
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.