简体   繁体   English

如何在特定日期范围内对熊猫列DataFrame中的某些值求和

[英]How to sum certain values in a pandas column DataFrame in a specific date range

I have a large DataFrame that looks something like this: df = 我有一个看起来像这样的大型DataFrame:df =

    UPC   Unit_Sales  Price   Price_Change  Date 
 0   22          15    1.99         NaN     2017-10-10
 1   22          7     2.19         True    2017-10-12
 2   22          6     2.19         NaN     2017-10-13
 3   22          7     1.99         True    2017-10-16
 4   22          4     1.99         NaN     2017-10-17
 5   35          15    3.99         NaN     2017-10-09
 6   35          17    3.99         NaN     2017-10-11
 7   35          5     4.29         True    2017-10-13
 8   35          8     4.29         NaN     2017-10-15
 9   35          2     4.29         NaN     2017-10-15

Basically I am trying to record how the sales of a product(UPC) reacted once the price changed for the following 7 days. 基本上,我试图记录在接下来的7天价格变化后产品(UPC)的销售情况如何。 I want to create a new column ['Reaction'] which records the sum of the unit sales from the day of price change, and 7 days forward. 我想创建一个新列['Reaction'],该列记录从价格更改之日起以及以后7天的单位销售额总和。 Keep in mind, sometimes a UPC has more than 2 price changes, so I want a different sum for each price change. 请记住,有时UPC的价格变化超过2个,因此我希望每个价格变化的总和都不同。 So I want to see this: 所以我想看看这个:

    UPC   Unit_Sales  Price   Price_Change  Date        Reaction
 0   22          15    1.99         NaN     2017-10-10      NaN
 1   22          7     2.19         True    2017-10-12      13   
 2   22          6     2.19         NaN     2017-10-13      NaN
 3   22          7     1.99         True    2017-10-16      11
 4   22          4     1.99         NaN     2017-10-19      NaN
 5   35          15    3.99         NaN     2017-10-09      NaN
 6   35          17    3.99         NaN     2017-10-11      NaN
 7   35          5     4.29         True    2017-10-13       15
 8   35          8     4.29         NaN     2017-10-15      NaN
 9   35          2     4.29         NaN     2017-10-18      NaN

What is difficult is how the dates are set up in my data. 困难的是如何在我的数据中设置日期。 Sometimes (like for UPC 35) the dates don't range past 7 days. 有时(例如UPC 35)日期不超过7天。 So I would want it to default to the next nearest date, or however many dates there are (if there are less than 7 days). 因此,我希望它默认为下一个最近的日期,或者默认为下一个最近的日期(如果少于7天)。

Here's what I've tried: I set the date to a datetime and I'm thinking of counting days by .days method. 这是我尝试过的方法:我将日期设置为日期时间,并考虑通过.days方法对天数进行计数。 This is how I'm thinking of setting a code up (rough draft): 这就是我考虑设置代码(草稿)的方式:

  x = df.loc[df['Price_Change'] == 'True']
  for x in df: 
       df['Reaction'] = sum(df.Unit_Sales[1day :8days])

Is there an easier way to do this, maybe without a for loop? 有没有更简单的方法可以做到这一点,也许没有for循环?

You just need ffill with groupby 您只需要ffill groupby

df.loc[df.Price_Change==True,'Reaction']=df.groupby('UPC').apply(lambda x : (x['Price_Change'].ffill()*x['Unit_Sales']).sum()).values
df
Out[807]: 
   UPC  Unit_Sales  Price Price_Change        Date  Reaction
0   22          15   1.99          NaN  2017-10-10       NaN
1   22           7   2.19         True  2017-10-12      24.0
2   22           6   2.19          NaN  2017-10-13       NaN
3   22           7   2.19          NaN  2017-10-16       NaN
4   22           4   2.19          NaN  2017-10-17       NaN
5   35          15   3.99          NaN  2017-10-09       NaN
6   35          17   3.99          NaN  2017-10-11       NaN
7   35           5   4.29         True  2017-10-13      15.0
8   35           8   4.29          NaN  2017-10-15       NaN
9   35           2   4.29          NaN  2017-10-15       NaN

Update 更新资料

df['New']=df.groupby('UPC').apply(lambda x : x['Price_Change']==True).cumsum().values

v1=df.groupby(['UPC','New']).apply(lambda x : (x['Price_Change'].ffill()*x['Unit_Sales']).sum())

df=df.merge(v1.reset_index())

df[0]=df[0].mask(df['Price_Change']!=True)
df
Out[927]: 
   UPC  Unit_Sales  Price Price_Change        Date  New     0
0   22          15   1.99          NaN  2017-10-10    0   NaN
1   22           7   2.19         True  2017-10-12    1  13.0
2   22           6   2.19          NaN  2017-10-13    1   NaN
3   22           7   1.99         True  2017-10-16    2  11.0
4   22           4   1.99          NaN  2017-10-17    2   NaN
5   35          15   3.99          NaN  2017-10-09    2   NaN
6   35          17   3.99          NaN  2017-10-11    2   NaN
7   35           5   4.29         True  2017-10-13    3  15.0
8   35           8   4.29          NaN  2017-10-15    3   NaN
9   35           2   4.29          NaN  2017-10-15    3   NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe - 将特定日期的值相加,然后除以该日期的计数 - Pandas Dataframe - Sum values for a specific date then divide by the count of that date 是否可以通过指定日期范围对 Pandas 数据框中的值进行重新采样和求和? - Is it possible to resample and sum values in a Pandas dataframe by specifying a date range? 如何将日期范围的日期相加并将它们附加到pandas中新数据框中的新列? - How to sum by > date for a range of dates and append them to a new column in a new dataframe in pandas? 如何在特定时间范围内对python pandas数据帧进行求和 - How to sum python pandas dataframe in certain time range 另一个数据帧的熊猫数据帧总和日期范围 - pandas dataframe sum date range of another DataFrame 如何 select pandas dataframe 列中的一系列值? - How to select a range of values in a pandas dataframe column? Pandas 如何在 dataframe 的列中用一定范围的数字过滤 dataframe - Pandas how to filter dataframe with certain range of numbers in a column of dataframe Python Pandas DataFrame - 如何根据另一列(日期类型)中的部分匹配对 1 列中的值求和? - Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)? 如何对与pandas DataFrame中另一列的特定值对应的列值求和? - How can I sum column values that corrispond to a specific value of another column in a pandas DataFrame? 如何使用日期范围作为列中的值创建 dataframe? - How to create a dataframe with date range as values in a column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM