简体   繁体   English

Pandas groupby:计算每个组在一个时间范围内出现的次数

[英]Pandas groupby: Count the number of occurrences within a time range for each group

I have a dataframe: 我有一个数据帧:

ID  DATE       WIN
A   2015/6/5   Yes
A   2015/6/7   Yes
A   2015/6/7   Yes
A   2015/6/7   Yes
B   2015/6/8   No
B   2015/8/7   Yes
C   2015/5/15  Yes
C   2015/5/30  No
C   2015/7/30  No
C   2015/8/03  Yes

I want to add a column that counts the number of wins for each ID within the past 1 month, so the result will look like: 我想添加一个列,计算过去1个月内每个ID的获胜次数,结果如下所示:

ID  DATE       WIN  NumOfDaysSinceLastWin NumOfWinsInThePast30days
A   2015/6/5   Yes           0               0       
A   2015/6/7   Yes           2               1 
A   2015/6/7   Yes           2               1 or (A 2015/6/7 Yes 0 2)
A   2015/6/8   No            1               3 
B   2015/8/7   No            0               0
B   2015/8/7   Yes           0               0
C   2015/5/15  Yes           0               0
C   2015/5/30  No            15              1
C   2015/7/30  No            76              0
C   2015/8/03  Yes           80              0

How can I use groupby function and timegrouper to get this? 如何使用groupby函数和timegrouper来获取它?

Input data has to be sorted in each group by DATE, in this data is it OK. 输入数据必须按 DATE在每个组中进行排序 ,在此数据中是否正常。
Input data cannot mapped situation very well, so next 4 rows will be added. 输入数据无法很好地映射情况,因此将添加下4行。

Column WIN1 is created from WIN - values 1 for 'Yes' and 0 for 'No' . WIN1列是从WIN创建的 - 值为1表示'Yes'0表示'No' I need it for both output columns. 我需要两个输出列。

df['WIN1'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else 0)

Column NumOfDaysSinceLastWin NumOfDaysSinceLastWin

Firstly column cumsum (cumulative sum) is created. 首先创建列cumsum (累积和)。

df['cumsum'] = df['WIN1'].cumsum()

If all WIN are 'Yes' , it is easy. 如果所有WIN都是'Yes' ,那很简单。 Data will be grouped and difference between date and previous date(-1) value is in columns diffs . 数据将被分组,日期和上一个日期(-1)之间的差值在列diffs

#df['diffs'] = df.groupby(['ID', 'cumsum'])['DATE'].apply(lambda d: (d-d.shift()).fillna(0)) 

But situation is complicated, because value 'No' of column WIN . 但情况很复杂,因为列WIN值为'No' So if value is 'Yes' , you need difference with previous 'Yes' , if 'No' need difference with last previous 'WIN' . 因此,如果值为'Yes' ,则需要与之前的 'Yes'不同,如果'No'需要与之前的 'WIN' Difference can be counted many ways, but there is chosen by subtracting two columns - DATE and column date1 . 差异可以通过多种方式计算,但通过减去两列DATE和列date1

Column date1 date1
Rows has to be grouped special way - values 'No' and last previous 'Yes' . 行必须以特殊方式分组 - 值'No' ,前一个'Yes' It is possible by cumulative sum in column cumsum . 可以通过列cumsum累积和来cumsum Then minimal value of this group is value of 'Yes' column and then this value is repeating to rows with 'No' values. 然后,该组的最小值为'Yes'列的值,然后该值重复到具有'No'值的行。 Column count is special - no repeating values of cumsum column are 1 . count是特殊的 - 没有重复的cumsum列值是1 Repeating ones are incremented in groups. 重复的按组递增。

df['min'] = df.groupby(['ID','cumsum'])['DATE'].transform('min')
df['count'] = df.groupby(['cumsum'])['cumsum'].transform('count')

Dates of values 'YES' in previous rows are necessary for difference. 前一行中值'YES'日期对于差异是必要的。 Dataframe df1 filtered only values 'Yes' of df and then then grouped it by column ID . 数据帧df1仅过滤df的值'Yes' ,然后按列ID对其进行分组。 Indices are unchanged, so output can be mapped to new column of dataframe df . 指数不变,因此输出可以映射到数据帧df新列。

df1 = df[~df['WIN'].isin(['No'])]
df['date1'] = df1.groupby(['ID'])['DATE'].apply(lambda d: d.shift()) 
print df

   ID       DATE  WIN  WIN1  cumsum        min  count      date1
0   A 2015-06-05  Yes     1       1 2015-06-05      1        NaT
1   A 2015-06-05  Yes     1       2 2015-06-05      1 2015-06-05
2   A 2015-06-07  Yes     1       3 2015-06-07      1 2015-06-05
3   A 2015-06-07  Yes     1       4 2015-06-07      1 2015-06-07
4   A 2015-06-07  Yes     1       5 2015-06-07      4 2015-06-07
5   A 2015-06-08   No     0       5 2015-06-07      4        NaT
6   B 2015-06-07   No     0       5 2015-06-07      4        NaT
7   B 2015-06-07   No     0       5 2015-06-07      4        NaT
8   B 2015-08-07  Yes     1       6 2015-08-07      1        NaT
9   C 2015-05-15  Yes     1       7 2015-05-15      3        NaT
10  C 2015-05-30   No     0       7 2015-05-15      3        NaT
11  C 2015-07-30   No     0       7 2015-05-15      3        NaT
12  C 2015-08-03  Yes     1       8 2015-08-03      1 2015-05-15
13  C 2015-08-03  Yes     1       9 2015-08-03      1 2015-08-03

Then date columns min (values 'No' and last previous 'Yes' ) and column date1 (other values 'Yes' ) can be join by columns count . 然后,日期列min (值'No'和前一个'Yes' )和列date1 (其他值'Yes' )可以按列count
New condition was added - values of column date1 will be null - ( NaT ), because these values will be overwritten by column min . 添加了新条件 - 列date1值将为null - ( NaT ),因为这些值将被列min覆盖。

df.loc[(df['count'] > 1) & (df['date1'].isnull()), 'date1'] = df['min']
print df

   ID       DATE  WIN  WIN1  cumsum        min  count      date1
0   A 2015-06-05  Yes     1       1 2015-06-05      1 2015-06-05
1   A 2015-06-05  Yes     1       2 2015-06-05      1 2015-06-05
2   A 2015-06-07  Yes     1       3 2015-06-07      1 2015-06-05
3   A 2015-06-07  Yes     1       4 2015-06-07      1 2015-06-07
4   A 2015-06-07  Yes     1       5 2015-06-07      4 2015-06-07
5   A 2015-06-08   No     0       5 2015-06-07      4 2015-06-07
6   B 2015-06-07   No     0       5 2015-06-07      4 2015-06-07
7   B 2015-06-07   No     0       5 2015-06-07      4 2015-06-07
8   B 2015-08-07  Yes     1       6 2015-08-07      1 2015-08-07
9   C 2015-05-15  Yes     1       7 2015-05-15      3 2015-05-15
10  C 2015-05-30   No     0       7 2015-05-15      3 2015-05-15
11  C 2015-07-30   No     0       7 2015-05-15      3 2015-05-15
12  C 2015-08-03  Yes     1       8 2015-08-03      1 2015-05-15
13  C 2015-08-03  Yes     1       9 2015-08-03      1 2015-08-03

Repeating datetimes - subsolution 重复日期时间 - 子解决方案
Sorry, if this is made this complicated way, maybe someone find better one. 对不起,如果这是一个复杂的方式,也许有人找到更好的。
My solution is find repeating values, fill them by last previous 'Yes' and added to column date1 for difference. 我的解决方案是找到重复值,按上一个'Yes'填充它们,并将其添加到列date1以获得差异。
These values are identified in column count . 这些值在列count中标识。 Other (values 1 ) are reset to NaN . 其他(值1 )重置为NaN Then values from date1 are copied to date2 by column count . 然后, date1中的值按列count复制到date2

df['count'] = df1.groupby(['ID', 'DATE', 'WIN1'])['WIN1'].transform('count')
df.loc[df['count'] == 1 , 'count'] = np.nan
df.loc[df['count'].notnull() , 'date2'] = df['date1']
print df    

   ID       DATE  WIN  WIN1  cumsum        min  count      date1      date2
0   A 2015-06-05  Yes     1       1 2015-06-05      2 2015-06-05 2015-06-05
1   A 2015-06-05  Yes     1       2 2015-06-05      2 2015-06-05 2015-06-05
2   A 2015-06-07  Yes     1       3 2015-06-07      3 2015-06-05 2015-06-05
3   A 2015-06-07  Yes     1       4 2015-06-07      3 2015-06-07 2015-06-07
4   A 2015-06-07  Yes     1       5 2015-06-07      3 2015-06-07 2015-06-07
5   A 2015-06-08   No     0       5 2015-06-07    NaN 2015-06-07        NaT
6   B 2015-06-07   No     0       5 2015-06-07    NaN 2015-06-07        NaT
7   B 2015-06-07   No     0       5 2015-06-07    NaN 2015-06-07        NaT
8   B 2015-08-07  Yes     1       6 2015-08-07    NaN 2015-08-07        NaT
9   C 2015-05-15  Yes     1       7 2015-05-15    NaN 2015-05-15        NaT
10  C 2015-05-30   No     0       7 2015-05-15    NaN 2015-05-15        NaT
11  C 2015-07-30   No     0       7 2015-05-15    NaN 2015-05-15        NaT
12  C 2015-08-03  Yes     1       8 2015-08-03      2 2015-05-15 2015-05-15
13  C 2015-08-03  Yes     1       9 2015-08-03      2 2015-08-03 2015-08-03 

Then this values are repeated by group's minimal value and added to date1 column. 然后,这个值按组的最小值重复,并添加到date1列。

def repeat_value(grp):
    grp['date2'] = grp['date2'].min()
    return grp

df = df.groupby(['ID', 'DATE']).apply(repeat_value)
df.loc[df1['date2'].notnull() , 'date1'] = df['date2']
print df

   ID       DATE  WIN  WIN1  cumsum        min  count      date1      date2
0   A 2015-06-05  Yes     1       1 2015-06-05      2 2015-06-05 2015-06-05
1   A 2015-06-05  Yes     1       2 2015-06-05      2 2015-06-05 2015-06-05
2   A 2015-06-07  Yes     1       3 2015-06-07      3 2015-06-05 2015-06-05
3   A 2015-06-07  Yes     1       4 2015-06-07      3 2015-06-05 2015-06-05
4   A 2015-06-07  Yes     1       5 2015-06-07      3 2015-06-05 2015-06-05
5   A 2015-06-08   No     0       5 2015-06-07    NaN 2015-06-07        NaT
6   B 2015-06-07   No     0       5 2015-06-07    NaN 2015-06-07        NaT
7   B 2015-06-07   No     0       5 2015-06-07    NaN 2015-06-07        NaT
8   B 2015-08-07  Yes     1       6 2015-08-07    NaN 2015-08-07        NaT
9   C 2015-05-15  Yes     1       7 2015-05-15    NaN 2015-05-15        NaT
10  C 2015-05-30   No     0       7 2015-05-15    NaN 2015-05-15        NaT
11  C 2015-07-30   No     0       7 2015-05-15    NaN 2015-05-15        NaT
12  C 2015-08-03  Yes     1       8 2015-08-03      2 2015-05-15 2015-05-15
13  C 2015-08-03  Yes     1       9 2015-08-03      2 2015-05-15 2015-05-15 

Column NumOfDaysSinceLastWin is filled by difference of col date1 and DATE . NumOfDaysSinceLastWin由col date1DATE差异填充。 Datatype is Timedelta , so it will be converted to integer. 数据类型是Timedelta ,因此它将转换为整数。 Finally unnecessary columns will be deleted. 最后将删除不必要的列。 (Only columns WIN1 and count is necessary for next output column, so it isn't deleted.) (下一个输出列只需要WIN1count列,因此不会删除它。)

df['NumOfDaysSinceLastWin'] = ((df['DATE'] - df['date1']).fillna(0)).astype('timedelta64[D]')
df = df.drop(['cumsum','min', 'date1'], axis=1 )
print df
   ID       DATE  WIN  WIN1  count  NumOfDaysSinceLastWin
0   A 2015-06-05  Yes     1      2                      0
1   A 2015-06-05  Yes     1      2                      0
2   A 2015-06-07  Yes     1      3                      2
3   A 2015-06-07  Yes     1      3                      2
4   A 2015-06-07  Yes     1      3                      2
5   A 2015-06-08   No     0    NaN                      1
6   B 2015-06-07   No     0    NaN                      0
7   B 2015-06-07   No     0    NaN                      0
8   B 2015-08-07  Yes     1    NaN                      0
9   C 2015-05-15  Yes     1    NaN                      0
10  C 2015-05-30   No     0    NaN                     15
11  C 2015-07-30   No     0    NaN                     76
12  C 2015-08-03  Yes     1      2                     80
13  C 2015-08-03  Yes     1      2                     80

Column NumOfWinsInThePast30days NumOfWinsInThePast30days

Rolling sum is your friend. 滚动金额是你的朋友。 Column yes (necessary for resampling) is mapped by 1 for 'Yes' and NaN for 'No' . yes (重新采样所必需)被映射为1表示'Yes'NaN表示'No'

df['yes'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else np.nan)

Dataframe df2 is copy of df and columns DATE is set to index (for resampling). 数据帧df2df副本,列DATE设置为索引(用于重新采样)。 Unnecessary columns will be deleted. 不必要的列将被删除。

df2 = df.set_index('DATE')
df2 = df2.drop(['NumOfDaysSinceLastWin','WIN', 'WIN1'], axis=1)

Then df2 will be resampling by days, if row is 'Yes' value is 1 , if 'No' is 0 . 然后df2将重新采样数天,如果行为'Yes' ,则值为1 ,如果'No'0 (Better see bellow as explain.) (最好看下面的解释。)

df2 = df2.groupby('ID').resample("D", how='count')
df2 = df2.reset_index()

Dataframe df2 will be grouped by ID and function rolling_sum is using for these groups. 数据帧df2将按ID分组, rolling_sum函数用于这些组。

df2['rollsum'] = df2.groupby('ID')['yes'].transform(pd.rolling_sum, window=30, min_periods=1)

For better understanding all rows of df2 will be displayed. 为了更好地理解,将显示所有df2行。

with pd.option_context('display.max_rows', 999, 'display.max_columns', 5):
    print df2

    ID       DATE  yes  rollsum
0    A 2015-06-05    2        2
1    A 2015-06-06    0        2
2    A 2015-06-07    3        5
3    A 2015-06-08    0        5
4    B 2015-06-07    0        0
5    B 2015-06-08    0        0
6    B 2015-06-09    0        0
7    B 2015-06-10    0        0
8    B 2015-06-11    0        0
9    B 2015-06-12    0        0
10   B 2015-06-13    0        0
11   B 2015-06-14    0        0
12   B 2015-06-15    0        0
13   B 2015-06-16    0        0
14   B 2015-06-17    0        0
15   B 2015-06-18    0        0
16   B 2015-06-19    0        0
17   B 2015-06-20    0        0
18   B 2015-06-21    0        0
19   B 2015-06-22    0        0
20   B 2015-06-23    0        0
21   B 2015-06-24    0        0
22   B 2015-06-25    0        0
23   B 2015-06-26    0        0
24   B 2015-06-27    0        0
25   B 2015-06-28    0        0
26   B 2015-06-29    0        0
27   B 2015-06-30    0        0
28   B 2015-07-01    0        0
29   B 2015-07-02    0        0
30   B 2015-07-03    0        0
31   B 2015-07-04    0        0
32   B 2015-07-05    0        0
33   B 2015-07-06    0        0
34   B 2015-07-07    0        0
35   B 2015-07-08    0        0
36   B 2015-07-09    0        0
37   B 2015-07-10    0        0
38   B 2015-07-11    0        0
39   B 2015-07-12    0        0
40   B 2015-07-13    0        0
41   B 2015-07-14    0        0
42   B 2015-07-15    0        0
43   B 2015-07-16    0        0
44   B 2015-07-17    0        0
45   B 2015-07-18    0        0
46   B 2015-07-19    0        0
47   B 2015-07-20    0        0
48   B 2015-07-21    0        0
49   B 2015-07-22    0        0
50   B 2015-07-23    0        0
51   B 2015-07-24    0        0
52   B 2015-07-25    0        0
53   B 2015-07-26    0        0
54   B 2015-07-27    0        0
55   B 2015-07-28    0        0
56   B 2015-07-29    0        0
57   B 2015-07-30    0        0
58   B 2015-07-31    0        0
59   B 2015-08-01    0        0
60   B 2015-08-02    0        0
61   B 2015-08-03    0        0
62   B 2015-08-04    0        0
63   B 2015-08-05    0        0
64   B 2015-08-06    0        0
65   B 2015-08-07    1        1
66   C 2015-05-15    1        1
67   C 2015-05-16    0        1
68   C 2015-05-17    0        1
69   C 2015-05-18    0        1
70   C 2015-05-19    0        1
71   C 2015-05-20    0        1
72   C 2015-05-21    0        1
73   C 2015-05-22    0        1
74   C 2015-05-23    0        1
75   C 2015-05-24    0        1
76   C 2015-05-25    0        1
77   C 2015-05-26    0        1
78   C 2015-05-27    0        1
79   C 2015-05-28    0        1
80   C 2015-05-29    0        1
81   C 2015-05-30    0        1
82   C 2015-05-31    0        1
83   C 2015-06-01    0        1
84   C 2015-06-02    0        1
85   C 2015-06-03    0        1
86   C 2015-06-04    0        1
87   C 2015-06-05    0        1
88   C 2015-06-06    0        1
89   C 2015-06-07    0        1
90   C 2015-06-08    0        1
91   C 2015-06-09    0        1
92   C 2015-06-10    0        1
93   C 2015-06-11    0        1
94   C 2015-06-12    0        1
95   C 2015-06-13    0        1
96   C 2015-06-14    0        0
97   C 2015-06-15    0        0
98   C 2015-06-16    0        0
99   C 2015-06-17    0        0
100  C 2015-06-18    0        0
101  C 2015-06-19    0        0
102  C 2015-06-20    0        0
103  C 2015-06-21    0        0
104  C 2015-06-22    0        0
105  C 2015-06-23    0        0
106  C 2015-06-24    0        0
107  C 2015-06-25    0        0
108  C 2015-06-26    0        0
109  C 2015-06-27    0        0
110  C 2015-06-28    0        0
111  C 2015-06-29    0        0
112  C 2015-06-30    0        0
113  C 2015-07-01    0        0
114  C 2015-07-02    0        0
115  C 2015-07-03    0        0
116  C 2015-07-04    0        0
117  C 2015-07-05    0        0
118  C 2015-07-06    0        0
119  C 2015-07-07    0        0
120  C 2015-07-08    0        0
121  C 2015-07-09    0        0
122  C 2015-07-10    0        0
123  C 2015-07-11    0        0
124  C 2015-07-12    0        0
125  C 2015-07-13    0        0
126  C 2015-07-14    0        0
127  C 2015-07-15    0        0
128  C 2015-07-16    0        0
129  C 2015-07-17    0        0
130  C 2015-07-18    0        0
131  C 2015-07-19    0        0
132  C 2015-07-20    0        0
133  C 2015-07-21    0        0
134  C 2015-07-22    0        0
135  C 2015-07-23    0        0
136  C 2015-07-24    0        0
137  C 2015-07-25    0        0
138  C 2015-07-26    0        0
139  C 2015-07-27    0        0
140  C 2015-07-28    0        0
141  C 2015-07-29    0        0
142  C 2015-07-30    0        0
143  C 2015-07-31    0        0
144  C 2015-08-01    0        0
145  C 2015-08-02    0        0
146  C 2015-08-03    2        2

Unnecessary column yes will be deleted. 将删除不必要的列yes

df2 = df2.drop(['yes'], axis=1 )

Output is merging with first dataframe df . 输出与第一个数据帧df合并。

df2 = pd.merge(df,df2,on=['DATE', 'ID'], how='inner')
print df2

  ID       DATE  WIN  WIN1  NumOfDaysSinceLastWin  yes  rollsum
0  A 2015-06-07  Yes     1                      0    1        2
1  A 2015-06-07  Yes     1                      0    1        2
2  B 2015-08-07   No     0                      0  NaN        1
3  B 2015-08-07  Yes     1                      0    1        1
4  C 2015-05-15  Yes     1                      0    1        1
5  C 2015-05-30   No     0                     15  NaN        1
6  C 2015-07-30   No     0                     76  NaN        0
7  C 2015-08-03  Yes     1                     80    1        1

If values in column count are not null , they are added to column count . 如果列count中的值不为null ,则将它们添加到列count Function rolling_sum is counted rows of original df with values 'YES' , so it has to be subtracting. 函数rolling_sum是原始df行,其值为'YES' ,因此必须减去。 And this values ( 1 ) are in column WIN1 . 并且此值( 1 )位于WIN1列中。

df2.loc[df['count'].notnull() , 'WIN1'] = df2['count']
df2['NumOfWinsInThePast30days'] = df2['rollsum'] - df2['WIN1']

Delete unnecessary columns. 删除不必要的列。

df2 = df2.drop(['yes','WIN1', 'rollsum', 'count'], axis=1 )
print df2
   ID       DATE  WIN  NumOfDaysSinceLastWin  NumOfWinsInThePast30days
0   A 2015-06-05  Yes                      0                         0
1   A 2015-06-05  Yes                      0                         0
2   A 2015-06-07  Yes                      2                         2
3   A 2015-06-07  Yes                      2                         2
4   A 2015-06-07  Yes                      2                         2
5   A 2015-06-08   No                      1                         5
6   B 2015-06-07   No                      0                         0
7   B 2015-06-07   No                      0                         0
8   B 2015-08-07  Yes                      0                         0
9   C 2015-05-15  Yes                      0                         0
10  C 2015-05-30   No                     15                         1
11  C 2015-07-30   No                     76                         0
12  C 2015-08-03  Yes                     80                         0
13  C 2015-08-03  Yes                     80                         0

And last all together: 最后一起:

import pandas as pd
import numpy as np
import io

#original data
temp=u"""ID,DATE,WIN
A,2015/6/5,Yes
A,2015/6/7,Yes
A,2015/6/7,Yes
A,2015/6/8,No
B,2015/6/7,No
B,2015/8/7,Yes
C,2015/5/15,Yes
C,2015/5/30,No
C,2015/7/30,No
C,2015/8/03,Yes"""

#changed repeating data
temp2=u"""ID,DATE,WIN
A,2015/6/5,Yes
A,2015/6/5,Yes
A,2015/6/7,Yes
A,2015/6/7,Yes
A,2015/6/7,Yes
A,2015/6/8,No
B,2015/6/7,No
B,2015/6/7,No
B,2015/8/7,Yes
C,2015/5/15,Yes
C,2015/5/30,No
C,2015/7/30,No
C,2015/8/03,Yes
C,2015/8/03,Yes"""

df = pd.read_csv(io.StringIO(temp2), parse_dates = [1])

df['WIN1'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else 0)
df['cumsum'] = df['WIN1'].cumsum()

#df['diffs'] = df.groupby(['ID', 'cumsum'])['DATE'].apply(lambda d: (d-d.shift()).fillna(0)) 

df['min'] = df.groupby(['ID','cumsum'])['DATE'].transform('min')
df['count'] = df.groupby(['cumsum'])['cumsum'].transform('count')

df1 = df[~df['WIN'].isin(['No'])]

df['date1'] = df1.groupby(['ID'])['DATE'].apply(lambda d: d.shift()) 
print df
df.loc[(df['count'] >= 1) & (df['date1'].isnull()), 'date1'] = df['min']
print df

#resolve repeating datetimes
df['count'] = df1.groupby(['ID', 'DATE', 'WIN1'])['WIN1'].transform('count')
df.loc[df['count'] == 1 , 'count'] = np.nan
df.loc[df['count'].notnull() , 'date2'] = df['date1']
print df

def repeat_value(grp):
    grp['date2'] = grp['date2'].min()
    return grp

df = df.groupby(['ID', 'DATE']).apply(repeat_value)
df.loc[df['date2'].notnull() , 'date1'] = df['date2']
print df

df['NumOfDaysSinceLastWin'] = (df['DATE'] - df['date1']).astype('timedelta64[D]')
df = df.drop(['cumsum','min','date1', 'date2'], axis=1 )
print df

#NumOfWinsInThePast30days
df['yes'] = df['WIN'].map(lambda x: 1 if x == 'Yes' else np.nan)
df2 = df.set_index('DATE')
df2 = df2.drop(['NumOfDaysSinceLastWin','WIN', 'WIN1','count'], axis=1)

df2 = df2.groupby('ID').resample("D", how='count')
df2 = df2.reset_index()

df2['rollsum'] = df2.groupby('ID')['yes'].transform(pd.rolling_sum, window=30, min_periods=1)

#with pd.option_context('display.max_rows', 999, 'display.max_columns', 5):
    #print df2

df2 = df2.drop(['yes'], axis=1 )
df2 = pd.merge(df,df2,on=['DATE', 'ID'], how='inner')
print df2

df2.loc[df['count'].notnull() , 'WIN1'] = df2['count']
df2['NumOfWinsInThePast30days'] = df2['rollsum'] - df2['WIN1']

df2 = df2.drop(['yes','WIN1', 'rollsum', 'count'], axis=1 )
print df2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM