[英]Pandas groupby: Count the number of occurrences within a time range for each group
I have a dataframe: 我有一个数据帧:
ID DATE WIN
A 2015/6/5 Yes
A 2015/6/7 Yes
A 2015/6/7 Yes
A 2015/6/7 Yes
B 2015/6/8 No
B 2015/8/7 Yes
C 2015/5/15 Yes
C 2015/5/30 No
C 2015/7/30 No
C 2015/8/03 Yes
I want to add a column that counts the number of wins for each ID
within the past 1 month, so the result will look like: 我想添加一个列,计算过去1个月内每个
ID
的获胜次数,结果如下所示:
ID DATE WIN NumOfDaysSinceLastWin NumOfWinsInThePast30days
A 2015/6/5 Yes 0 0
A 2015/6/7 Yes 2 1
A 2015/6/7 Yes 2 1 or (A 2015/6/7 Yes 0 2)
A 2015/6/8 No 1 3
B 2015/8/7 No 0 0
B 2015/8/7 Yes 0 0
C 2015/5/15 Yes 0 0
C 2015/5/30 No 15 1
C 2015/7/30 No 76 0
C 2015/8/03 Yes 80 0
How can I use groupby
function and timegrouper
to get this? 如何使用
groupby
函数和timegrouper
来获取它?
Input data has to be sorted in each group by DATE, in this data is it OK. 输入数据必须按 DATE在每个组中进行排序 ,在此数据中是否正常。
Input data cannot mapped situation very well, so next 4 rows will be added. 输入数据无法很好地映射情况,因此将添加下4行。
Column WIN1
is created from WIN
- values 1
for 'Yes'
and 0
for 'No'
. WIN1
列是从WIN
创建的 - 值为1
表示'Yes'
, 0
表示'No'
。 I need it for both output columns. 我需要两个输出列。
df['WIN1'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else 0)
NumOfDaysSinceLastWin
NumOfDaysSinceLastWin
Firstly column cumsum
(cumulative sum) is created. 首先创建列
cumsum
(累积和)。
df['cumsum'] = df['WIN1'].cumsum()
If all WIN
are 'Yes'
, it is easy. 如果所有
WIN
都是'Yes'
,那很简单。 Data will be grouped and difference between date and previous date(-1) value is in columns diffs
. 数据将被分组,日期和上一个日期(-1)之间的差值在列
diffs
。
#df['diffs'] = df.groupby(['ID', 'cumsum'])['DATE'].apply(lambda d: (d-d.shift()).fillna(0))
But situation is complicated, because value 'No'
of column WIN
. 但情况很复杂,因为列
WIN
值为'No'
。 So if value is 'Yes'
, you need difference with previous 'Yes'
, if 'No'
need difference with last previous 'WIN'
. 因此,如果值为
'Yes'
,则需要与之前的 'Yes'
不同,如果'No'
需要与之前的 'WIN'
。 Difference can be counted many ways, but there is chosen by subtracting two columns - DATE
and column date1
. 差异可以通过多种方式计算,但通过减去两列
DATE
和列date1
。
Column date1
列
date1
Rows has to be grouped special way - values 'No'
and last previous 'Yes'
. 行必须以特殊方式分组 - 值
'No'
,前一个'Yes'
。 It is possible by cumulative sum in column cumsum
. 可以通过列
cumsum
累积和来cumsum
。 Then minimal value of this group is value of 'Yes'
column and then this value is repeating to rows with 'No'
values. 然后,该组的最小值为
'Yes'
列的值,然后该值重复到具有'No'
值的行。 Column count
is special - no repeating values of cumsum
column are 1
. 列
count
是特殊的 - 没有重复的cumsum
列值是1
。 Repeating ones are incremented in groups. 重复的按组递增。
df['min'] = df.groupby(['ID','cumsum'])['DATE'].transform('min')
df['count'] = df.groupby(['cumsum'])['cumsum'].transform('count')
Dates of values 'YES'
in previous rows are necessary for difference. 前一行中值
'YES'
日期对于差异是必要的。 Dataframe df1
filtered only values 'Yes'
of df and then then grouped it by column ID
. 数据帧
df1
仅过滤df的值'Yes'
,然后按列ID
对其进行分组。 Indices are unchanged, so output can be mapped to new column of dataframe df
. 指数不变,因此输出可以映射到数据帧
df
新列。
df1 = df[~df['WIN'].isin(['No'])]
df['date1'] = df1.groupby(['ID'])['DATE'].apply(lambda d: d.shift())
print df
ID DATE WIN WIN1 cumsum min count date1
0 A 2015-06-05 Yes 1 1 2015-06-05 1 NaT
1 A 2015-06-05 Yes 1 2 2015-06-05 1 2015-06-05
2 A 2015-06-07 Yes 1 3 2015-06-07 1 2015-06-05
3 A 2015-06-07 Yes 1 4 2015-06-07 1 2015-06-07
4 A 2015-06-07 Yes 1 5 2015-06-07 4 2015-06-07
5 A 2015-06-08 No 0 5 2015-06-07 4 NaT
6 B 2015-06-07 No 0 5 2015-06-07 4 NaT
7 B 2015-06-07 No 0 5 2015-06-07 4 NaT
8 B 2015-08-07 Yes 1 6 2015-08-07 1 NaT
9 C 2015-05-15 Yes 1 7 2015-05-15 3 NaT
10 C 2015-05-30 No 0 7 2015-05-15 3 NaT
11 C 2015-07-30 No 0 7 2015-05-15 3 NaT
12 C 2015-08-03 Yes 1 8 2015-08-03 1 2015-05-15
13 C 2015-08-03 Yes 1 9 2015-08-03 1 2015-08-03
Then date columns min
(values 'No'
and last previous 'Yes'
) and column date1
(other values 'Yes'
) can be join by columns count
. 然后,日期列
min
(值'No'
和前一个'Yes'
)和列date1
(其他值'Yes'
)可以按列count
。
New condition was added - values of column date1
will be null - ( NaT
), because these values will be overwritten by column min
. 添加了新条件 - 列
date1
值将为null - ( NaT
),因为这些值将被列min
覆盖。
df.loc[(df['count'] > 1) & (df['date1'].isnull()), 'date1'] = df['min']
print df
ID DATE WIN WIN1 cumsum min count date1
0 A 2015-06-05 Yes 1 1 2015-06-05 1 2015-06-05
1 A 2015-06-05 Yes 1 2 2015-06-05 1 2015-06-05
2 A 2015-06-07 Yes 1 3 2015-06-07 1 2015-06-05
3 A 2015-06-07 Yes 1 4 2015-06-07 1 2015-06-07
4 A 2015-06-07 Yes 1 5 2015-06-07 4 2015-06-07
5 A 2015-06-08 No 0 5 2015-06-07 4 2015-06-07
6 B 2015-06-07 No 0 5 2015-06-07 4 2015-06-07
7 B 2015-06-07 No 0 5 2015-06-07 4 2015-06-07
8 B 2015-08-07 Yes 1 6 2015-08-07 1 2015-08-07
9 C 2015-05-15 Yes 1 7 2015-05-15 3 2015-05-15
10 C 2015-05-30 No 0 7 2015-05-15 3 2015-05-15
11 C 2015-07-30 No 0 7 2015-05-15 3 2015-05-15
12 C 2015-08-03 Yes 1 8 2015-08-03 1 2015-05-15
13 C 2015-08-03 Yes 1 9 2015-08-03 1 2015-08-03
Repeating datetimes - subsolution 重复日期时间 - 子解决方案
Sorry, if this is made this complicated way, maybe someone find better one. 对不起,如果这是一个复杂的方式,也许有人找到更好的。
My solution is find repeating values, fill them by last previous 'Yes'
and added to column date1
for difference. 我的解决方案是找到重复值,按上一个
'Yes'
填充它们,并将其添加到列date1
以获得差异。
These values are identified in column count
. 这些值在列
count
中标识。 Other (values 1
) are reset to NaN
. 其他(值
1
)重置为NaN
。 Then values from date1
are copied to date2
by column count
. 然后,
date1
中的值按列count
复制到date2
。
df['count'] = df1.groupby(['ID', 'DATE', 'WIN1'])['WIN1'].transform('count')
df.loc[df['count'] == 1 , 'count'] = np.nan
df.loc[df['count'].notnull() , 'date2'] = df['date1']
print df
ID DATE WIN WIN1 cumsum min count date1 date2
0 A 2015-06-05 Yes 1 1 2015-06-05 2 2015-06-05 2015-06-05
1 A 2015-06-05 Yes 1 2 2015-06-05 2 2015-06-05 2015-06-05
2 A 2015-06-07 Yes 1 3 2015-06-07 3 2015-06-05 2015-06-05
3 A 2015-06-07 Yes 1 4 2015-06-07 3 2015-06-07 2015-06-07
4 A 2015-06-07 Yes 1 5 2015-06-07 3 2015-06-07 2015-06-07
5 A 2015-06-08 No 0 5 2015-06-07 NaN 2015-06-07 NaT
6 B 2015-06-07 No 0 5 2015-06-07 NaN 2015-06-07 NaT
7 B 2015-06-07 No 0 5 2015-06-07 NaN 2015-06-07 NaT
8 B 2015-08-07 Yes 1 6 2015-08-07 NaN 2015-08-07 NaT
9 C 2015-05-15 Yes 1 7 2015-05-15 NaN 2015-05-15 NaT
10 C 2015-05-30 No 0 7 2015-05-15 NaN 2015-05-15 NaT
11 C 2015-07-30 No 0 7 2015-05-15 NaN 2015-05-15 NaT
12 C 2015-08-03 Yes 1 8 2015-08-03 2 2015-05-15 2015-05-15
13 C 2015-08-03 Yes 1 9 2015-08-03 2 2015-08-03 2015-08-03
Then this values are repeated by group's minimal value and added to date1
column. 然后,这个值按组的最小值重复,并添加到
date1
列。
def repeat_value(grp):
grp['date2'] = grp['date2'].min()
return grp
df = df.groupby(['ID', 'DATE']).apply(repeat_value)
df.loc[df1['date2'].notnull() , 'date1'] = df['date2']
print df
ID DATE WIN WIN1 cumsum min count date1 date2
0 A 2015-06-05 Yes 1 1 2015-06-05 2 2015-06-05 2015-06-05
1 A 2015-06-05 Yes 1 2 2015-06-05 2 2015-06-05 2015-06-05
2 A 2015-06-07 Yes 1 3 2015-06-07 3 2015-06-05 2015-06-05
3 A 2015-06-07 Yes 1 4 2015-06-07 3 2015-06-05 2015-06-05
4 A 2015-06-07 Yes 1 5 2015-06-07 3 2015-06-05 2015-06-05
5 A 2015-06-08 No 0 5 2015-06-07 NaN 2015-06-07 NaT
6 B 2015-06-07 No 0 5 2015-06-07 NaN 2015-06-07 NaT
7 B 2015-06-07 No 0 5 2015-06-07 NaN 2015-06-07 NaT
8 B 2015-08-07 Yes 1 6 2015-08-07 NaN 2015-08-07 NaT
9 C 2015-05-15 Yes 1 7 2015-05-15 NaN 2015-05-15 NaT
10 C 2015-05-30 No 0 7 2015-05-15 NaN 2015-05-15 NaT
11 C 2015-07-30 No 0 7 2015-05-15 NaN 2015-05-15 NaT
12 C 2015-08-03 Yes 1 8 2015-08-03 2 2015-05-15 2015-05-15
13 C 2015-08-03 Yes 1 9 2015-08-03 2 2015-05-15 2015-05-15
Column NumOfDaysSinceLastWin
is filled by difference of col date1
and DATE
. 列
NumOfDaysSinceLastWin
由col date1
和DATE
差异填充。 Datatype is Timedelta
, so it will be converted to integer. 数据类型是
Timedelta
,因此它将转换为整数。 Finally unnecessary columns will be deleted. 最后将删除不必要的列。 (Only columns
WIN1
and count
is necessary for next output column, so it isn't deleted.) (下一个输出列只需要
WIN1
和count
列,因此不会删除它。)
df['NumOfDaysSinceLastWin'] = ((df['DATE'] - df['date1']).fillna(0)).astype('timedelta64[D]')
df = df.drop(['cumsum','min', 'date1'], axis=1 )
print df
ID DATE WIN WIN1 count NumOfDaysSinceLastWin
0 A 2015-06-05 Yes 1 2 0
1 A 2015-06-05 Yes 1 2 0
2 A 2015-06-07 Yes 1 3 2
3 A 2015-06-07 Yes 1 3 2
4 A 2015-06-07 Yes 1 3 2
5 A 2015-06-08 No 0 NaN 1
6 B 2015-06-07 No 0 NaN 0
7 B 2015-06-07 No 0 NaN 0
8 B 2015-08-07 Yes 1 NaN 0
9 C 2015-05-15 Yes 1 NaN 0
10 C 2015-05-30 No 0 NaN 15
11 C 2015-07-30 No 0 NaN 76
12 C 2015-08-03 Yes 1 2 80
13 C 2015-08-03 Yes 1 2 80
NumOfWinsInThePast30days
NumOfWinsInThePast30days
Rolling sum is your friend. 滚动金额是你的朋友。 Column
yes
(necessary for resampling) is mapped by 1
for 'Yes'
and NaN
for 'No'
. 列
yes
(重新采样所必需)被映射为1
表示'Yes'
, NaN
表示'No'
。
df['yes'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else np.nan)
Dataframe df2
is copy of df
and columns DATE
is set to index (for resampling). 数据帧
df2
是df
副本,列DATE
设置为索引(用于重新采样)。 Unnecessary columns will be deleted. 不必要的列将被删除。
df2 = df.set_index('DATE')
df2 = df2.drop(['NumOfDaysSinceLastWin','WIN', 'WIN1'], axis=1)
Then df2
will be resampling by days, if row is 'Yes'
value is 1
, if 'No'
is 0
. 然后
df2
将重新采样数天,如果行为'Yes'
,则值为1
,如果'No'
为0
。 (Better see bellow as explain.) (最好看下面的解释。)
df2 = df2.groupby('ID').resample("D", how='count')
df2 = df2.reset_index()
Dataframe df2
will be grouped by ID
and function rolling_sum
is using for these groups. 数据帧
df2
将按ID
分组, rolling_sum
函数用于这些组。
df2['rollsum'] = df2.groupby('ID')['yes'].transform(pd.rolling_sum, window=30, min_periods=1)
For better understanding all rows of df2
will be displayed. 为了更好地理解,将显示所有
df2
行。
with pd.option_context('display.max_rows', 999, 'display.max_columns', 5):
print df2
ID DATE yes rollsum
0 A 2015-06-05 2 2
1 A 2015-06-06 0 2
2 A 2015-06-07 3 5
3 A 2015-06-08 0 5
4 B 2015-06-07 0 0
5 B 2015-06-08 0 0
6 B 2015-06-09 0 0
7 B 2015-06-10 0 0
8 B 2015-06-11 0 0
9 B 2015-06-12 0 0
10 B 2015-06-13 0 0
11 B 2015-06-14 0 0
12 B 2015-06-15 0 0
13 B 2015-06-16 0 0
14 B 2015-06-17 0 0
15 B 2015-06-18 0 0
16 B 2015-06-19 0 0
17 B 2015-06-20 0 0
18 B 2015-06-21 0 0
19 B 2015-06-22 0 0
20 B 2015-06-23 0 0
21 B 2015-06-24 0 0
22 B 2015-06-25 0 0
23 B 2015-06-26 0 0
24 B 2015-06-27 0 0
25 B 2015-06-28 0 0
26 B 2015-06-29 0 0
27 B 2015-06-30 0 0
28 B 2015-07-01 0 0
29 B 2015-07-02 0 0
30 B 2015-07-03 0 0
31 B 2015-07-04 0 0
32 B 2015-07-05 0 0
33 B 2015-07-06 0 0
34 B 2015-07-07 0 0
35 B 2015-07-08 0 0
36 B 2015-07-09 0 0
37 B 2015-07-10 0 0
38 B 2015-07-11 0 0
39 B 2015-07-12 0 0
40 B 2015-07-13 0 0
41 B 2015-07-14 0 0
42 B 2015-07-15 0 0
43 B 2015-07-16 0 0
44 B 2015-07-17 0 0
45 B 2015-07-18 0 0
46 B 2015-07-19 0 0
47 B 2015-07-20 0 0
48 B 2015-07-21 0 0
49 B 2015-07-22 0 0
50 B 2015-07-23 0 0
51 B 2015-07-24 0 0
52 B 2015-07-25 0 0
53 B 2015-07-26 0 0
54 B 2015-07-27 0 0
55 B 2015-07-28 0 0
56 B 2015-07-29 0 0
57 B 2015-07-30 0 0
58 B 2015-07-31 0 0
59 B 2015-08-01 0 0
60 B 2015-08-02 0 0
61 B 2015-08-03 0 0
62 B 2015-08-04 0 0
63 B 2015-08-05 0 0
64 B 2015-08-06 0 0
65 B 2015-08-07 1 1
66 C 2015-05-15 1 1
67 C 2015-05-16 0 1
68 C 2015-05-17 0 1
69 C 2015-05-18 0 1
70 C 2015-05-19 0 1
71 C 2015-05-20 0 1
72 C 2015-05-21 0 1
73 C 2015-05-22 0 1
74 C 2015-05-23 0 1
75 C 2015-05-24 0 1
76 C 2015-05-25 0 1
77 C 2015-05-26 0 1
78 C 2015-05-27 0 1
79 C 2015-05-28 0 1
80 C 2015-05-29 0 1
81 C 2015-05-30 0 1
82 C 2015-05-31 0 1
83 C 2015-06-01 0 1
84 C 2015-06-02 0 1
85 C 2015-06-03 0 1
86 C 2015-06-04 0 1
87 C 2015-06-05 0 1
88 C 2015-06-06 0 1
89 C 2015-06-07 0 1
90 C 2015-06-08 0 1
91 C 2015-06-09 0 1
92 C 2015-06-10 0 1
93 C 2015-06-11 0 1
94 C 2015-06-12 0 1
95 C 2015-06-13 0 1
96 C 2015-06-14 0 0
97 C 2015-06-15 0 0
98 C 2015-06-16 0 0
99 C 2015-06-17 0 0
100 C 2015-06-18 0 0
101 C 2015-06-19 0 0
102 C 2015-06-20 0 0
103 C 2015-06-21 0 0
104 C 2015-06-22 0 0
105 C 2015-06-23 0 0
106 C 2015-06-24 0 0
107 C 2015-06-25 0 0
108 C 2015-06-26 0 0
109 C 2015-06-27 0 0
110 C 2015-06-28 0 0
111 C 2015-06-29 0 0
112 C 2015-06-30 0 0
113 C 2015-07-01 0 0
114 C 2015-07-02 0 0
115 C 2015-07-03 0 0
116 C 2015-07-04 0 0
117 C 2015-07-05 0 0
118 C 2015-07-06 0 0
119 C 2015-07-07 0 0
120 C 2015-07-08 0 0
121 C 2015-07-09 0 0
122 C 2015-07-10 0 0
123 C 2015-07-11 0 0
124 C 2015-07-12 0 0
125 C 2015-07-13 0 0
126 C 2015-07-14 0 0
127 C 2015-07-15 0 0
128 C 2015-07-16 0 0
129 C 2015-07-17 0 0
130 C 2015-07-18 0 0
131 C 2015-07-19 0 0
132 C 2015-07-20 0 0
133 C 2015-07-21 0 0
134 C 2015-07-22 0 0
135 C 2015-07-23 0 0
136 C 2015-07-24 0 0
137 C 2015-07-25 0 0
138 C 2015-07-26 0 0
139 C 2015-07-27 0 0
140 C 2015-07-28 0 0
141 C 2015-07-29 0 0
142 C 2015-07-30 0 0
143 C 2015-07-31 0 0
144 C 2015-08-01 0 0
145 C 2015-08-02 0 0
146 C 2015-08-03 2 2
Unnecessary column yes
will be deleted. 将删除不必要的列
yes
。
df2 = df2.drop(['yes'], axis=1 )
Output is merging with first dataframe df
. 输出与第一个数据帧
df
合并。
df2 = pd.merge(df,df2,on=['DATE', 'ID'], how='inner')
print df2
ID DATE WIN WIN1 NumOfDaysSinceLastWin yes rollsum
0 A 2015-06-07 Yes 1 0 1 2
1 A 2015-06-07 Yes 1 0 1 2
2 B 2015-08-07 No 0 0 NaN 1
3 B 2015-08-07 Yes 1 0 1 1
4 C 2015-05-15 Yes 1 0 1 1
5 C 2015-05-30 No 0 15 NaN 1
6 C 2015-07-30 No 0 76 NaN 0
7 C 2015-08-03 Yes 1 80 1 1
If values in column count
are not null
, they are added to column count
. 如果列
count
中的值不为null
,则将它们添加到列count
。 Function rolling_sum
is counted rows of original df
with values 'YES'
, so it has to be subtracting. 函数
rolling_sum
是原始df
行,其值为'YES'
,因此必须减去。 And this values ( 1
) are in column WIN1
. 并且此值(
1
)位于WIN1
列中。
df2.loc[df['count'].notnull() , 'WIN1'] = df2['count']
df2['NumOfWinsInThePast30days'] = df2['rollsum'] - df2['WIN1']
Delete unnecessary columns. 删除不必要的列。
df2 = df2.drop(['yes','WIN1', 'rollsum', 'count'], axis=1 )
print df2
ID DATE WIN NumOfDaysSinceLastWin NumOfWinsInThePast30days
0 A 2015-06-05 Yes 0 0
1 A 2015-06-05 Yes 0 0
2 A 2015-06-07 Yes 2 2
3 A 2015-06-07 Yes 2 2
4 A 2015-06-07 Yes 2 2
5 A 2015-06-08 No 1 5
6 B 2015-06-07 No 0 0
7 B 2015-06-07 No 0 0
8 B 2015-08-07 Yes 0 0
9 C 2015-05-15 Yes 0 0
10 C 2015-05-30 No 15 1
11 C 2015-07-30 No 76 0
12 C 2015-08-03 Yes 80 0
13 C 2015-08-03 Yes 80 0
And last all together: 最后一起:
import pandas as pd
import numpy as np
import io
#original data
temp=u"""ID,DATE,WIN
A,2015/6/5,Yes
A,2015/6/7,Yes
A,2015/6/7,Yes
A,2015/6/8,No
B,2015/6/7,No
B,2015/8/7,Yes
C,2015/5/15,Yes
C,2015/5/30,No
C,2015/7/30,No
C,2015/8/03,Yes"""
#changed repeating data
temp2=u"""ID,DATE,WIN
A,2015/6/5,Yes
A,2015/6/5,Yes
A,2015/6/7,Yes
A,2015/6/7,Yes
A,2015/6/7,Yes
A,2015/6/8,No
B,2015/6/7,No
B,2015/6/7,No
B,2015/8/7,Yes
C,2015/5/15,Yes
C,2015/5/30,No
C,2015/7/30,No
C,2015/8/03,Yes
C,2015/8/03,Yes"""
df = pd.read_csv(io.StringIO(temp2), parse_dates = [1])
df['WIN1'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else 0)
df['cumsum'] = df['WIN1'].cumsum()
#df['diffs'] = df.groupby(['ID', 'cumsum'])['DATE'].apply(lambda d: (d-d.shift()).fillna(0))
df['min'] = df.groupby(['ID','cumsum'])['DATE'].transform('min')
df['count'] = df.groupby(['cumsum'])['cumsum'].transform('count')
df1 = df[~df['WIN'].isin(['No'])]
df['date1'] = df1.groupby(['ID'])['DATE'].apply(lambda d: d.shift())
print df
df.loc[(df['count'] >= 1) & (df['date1'].isnull()), 'date1'] = df['min']
print df
#resolve repeating datetimes
df['count'] = df1.groupby(['ID', 'DATE', 'WIN1'])['WIN1'].transform('count')
df.loc[df['count'] == 1 , 'count'] = np.nan
df.loc[df['count'].notnull() , 'date2'] = df['date1']
print df
def repeat_value(grp):
grp['date2'] = grp['date2'].min()
return grp
df = df.groupby(['ID', 'DATE']).apply(repeat_value)
df.loc[df['date2'].notnull() , 'date1'] = df['date2']
print df
df['NumOfDaysSinceLastWin'] = (df['DATE'] - df['date1']).astype('timedelta64[D]')
df = df.drop(['cumsum','min','date1', 'date2'], axis=1 )
print df
#NumOfWinsInThePast30days
df['yes'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else np.nan)
df2 = df.set_index('DATE')
df2 = df2.drop(['NumOfDaysSinceLastWin','WIN', 'WIN1','count'], axis=1)
df2 = df2.groupby('ID').resample("D", how='count')
df2 = df2.reset_index()
df2['rollsum'] = df2.groupby('ID')['yes'].transform(pd.rolling_sum, window=30, min_periods=1)
#with pd.option_context('display.max_rows', 999, 'display.max_columns', 5):
#print df2
df2 = df2.drop(['yes'], axis=1 )
df2 = pd.merge(df,df2,on=['DATE', 'ID'], how='inner')
print df2
df2.loc[df['count'].notnull() , 'WIN1'] = df2['count']
df2['NumOfWinsInThePast30days'] = df2['rollsum'] - df2['WIN1']
df2 = df2.drop(['yes','WIN1', 'rollsum', 'count'], axis=1 )
print df2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.