I have a pandas Series that consists of numbers either 0 or 1.
2016-01-01 0
2016-01-02 1
2016-01-03 1
2016-01-04 0
2016-01-05 1
2016-01-06 1
2016-01-08 1
...
I want to make a dataframe using this Series, adding another series that provides information on how many 1s exist for a certain period of time.
For example, if the period was 5 days, then the dataframe would look like
Value 1s_for_the_last_5days
2016-01-01 0
2016-01-02 1
2016-01-03 1
2016-01-04 0
2016-01-05 1 3
2016-01-06 1 4
2016-01-08 1 4
...
In addition, I'd like to know if I can count the number of rows that are not zero, in a certain range, in a situation like the below.
Value Not_0_rows_for_the_last_5days
2016-01-01 0
2016-01-02 1.1
2016-01-03 0.4
2016-01-04 0
2016-01-05 0.6 3
2016-01-06 0.2 4
2016-01-08 10 4
Thank you for reading this. I would appreciate it if you could give me any solutions or hints on the problem.
You can use rolling
for this which creates a sized window and iterates over your given column while applying an aggregation like sum.
First create some dummy data:
import pandas as pd
import numpy as np
ser = pd.Series(np.random.randint(0, 2, size=10),
index=pd.date_range("2016-01-01", periods=10),
name="Value")
print(ser)
2016-01-01 1
2016-01-02 0
2016-01-03 0
2016-01-04 0
2016-01-05 0
2016-01-06 0
2016-01-07 0
2016-01-08 0
2016-01-09 1
2016-01-10 0
Freq: D, Name: Value, dtype: int64
Now, use rolling:
summed = ser.rolling(5).sum()
print(summed)
2016-01-01 NaN
2016-01-02 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 1.0
2016-01-06 0.0
2016-01-07 0.0
2016-01-08 0.0
2016-01-09 1.0
2016-01-10 1.0
Freq: D, Name: Value, dtype: float64
Finally, create the resulting data frame:
df = pd.DataFrame({"Value": ser, "Summed": summed})
print(df)
Summed Value
2016-01-01 NaN 1
2016-01-02 NaN 0
2016-01-03 NaN 0
2016-01-04 NaN 0
2016-01-05 1.0 0
2016-01-06 0.0 0
2016-01-07 0.0 0
2016-01-08 0.0 0
2016-01-09 1.0 1
2016-01-10 1.0 0
In order to count arbitrary values, define your own aggregation function in conjunction with apply
on the rolling window like:
# dummy function to count zeros
count_func = lambda x: (x==0).sum()
summed = ser.rolling(5).apply(count_func)
print(summed)
You may replace 0
with any value or combination of values of your original series.
you want rolling
s.rolling('5D').sum()
df = pd.DataFrame({'Value': s, '1s_for_the_last_5days': s.rolling('5D').sum()})
pd.Series.rolling
is a useful method but you can do this with a pythonic way:
def rolling_count(l,rolling_num=5,include_same_day=True):
output_list = []
for index,_ in enumerate(l):
start = index - rolling_num - int(include_same_day)
end = index + int(include_same_day)
if start < 0:
start = 0
output_list.append(sum(l[start:end]))
return output_list
data = {'Value': [0, 1, 1, 0, 1, 1, 1],
'date': ['2016-01-01','2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06','2016-01-08']}
df = pd.DataFrame(data).set_index('date')
l = df['Value'].tolist()
df['1s_for_the_last_5days'] = rolling_count(df['Value'],rolling_num=5)
print(df)
Output:
Value 1s_for_the_last_5days
date
2016-01-01 0 0
2016-01-02 1 1
2016-01-03 1 2
2016-01-04 0 2
2016-01-05 1 3
2016-01-06 1 4
2016-01-08 1 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.