简体   繁体   中英

Pandas, create new column based on other columns across multiple rows

Lets say I have the following dataframe representing the dietary habits of my pet frog

date       bugs_eaten_today
2019-01-31 0
2019-01-30 5
2019-01-29 6
2019-01-28 7
2019-01-27 2
...

Now I want to calculate a new column bugs_eaten_past_20_days

date       bugs_eaten_today bugs_eaten_paast_20_days
2019-01-31 0                48
2019-01-30 5                38
2019-01-29 6                57
2019-01-28 7                63
2019-01-27 2                21
...

How would I go about doing this? (Note that we don't have data for last 20 rows, so they will just be NaN )

You can do a rolling sum (with 20 rather than 3):

In [11]: df.bugs_eaten_today.rolling(3, 1).sum()
Out[11]:
0     0.0
1     5.0
2    11.0
3    18.0
4    15.0
Name: bugs_eaten_today, dtype: float64

You have to do this in reverse, since the index is reversed:

In [12]: df[::-1].bugs_eaten_today.rolling(3, 1).sum()
Out[12]:
4     2.0
3     9.0
2    15.0
1    18.0
0    11.0
Name: bugs_eaten_today, dtype: float64

In [13]: df['bugs_eaten_paast_20_days'] = df[::-1].bugs_eaten_today.rolling(3, 1).sum()

It's probably more robust to use date as the index and roll over 20D(ays):

In [21]: df1 = df.set_index('date').sort_index()

In [22]: df1.bugs_eaten_today.rolling('3D', 1).sum()
Out[22]:
date
2019-01-27     2.0
2019-01-28     9.0
2019-01-29    15.0
2019-01-30    18.0
2019-01-31    11.0
Name: bugs_eaten_today, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM