Pandas 在条件下添加列：如果单元格的值为 True，则将 Period 中最大数字的值设置为 true

Question

I have a pandas dataframe with lets say two columns, for example:我有一个 Pandas 数据框，可以说两列，例如：

     value  boolean
0        1        0
1        5        1
2        0        0
3        3        0
4        9        1
5       12        0
6        4        0
7        7        1
8        8        1
9        2        0
10      17        0
11      15        1
12       6        0

Now I want to add a third column (new_boolean) with the following criteria: I specify a period, for this example period = 4. Now I take a look at all rows where boolean == 1. new_boolean will be 1 for the maximum value in the last period rows.现在我想添加具有以下条件的第三列 (new_boolean)：我指定一个句点，对于此示例，句点 = 4。现在我查看所有布尔值 == 1 的行。对于最大值，new_boolean 将为 1在最后一期行中。

For example I have boolean == 1 for row 2. So I look at the last period rows.例如，我的第 2 行有 boolean == 1。所以我查看最后一期的行。 The values are [1, 5], 5 is the maximum, so the value for new_boolean in row 2 will be one.值为 [1, 5]，5 是最大值，因此第 2 行中 new_boolean 的值为 1。

Second example: row 8 (value = 7): I get values [7, 4, 12, 9], 12 is the maximum, so the value for new_boolean in the row with value 12 will be 1第二个示例：第 8 行（值 = 7）：我得到值 [7, 4, 12, 9]，12 是最大值，因此值为 12 的行中 new_boolean 的值将为 1

result:结果：

     value   boolean  new_boolean
0        1         0            0
1        5         1            1
2        0         0            0
3        3         0            0
4        9         1            1
5       12         0            1
6        4         0            0
7        7         1            0
8        8         1            0
9        2         0            0
10      17         0            1
11      15         1            0
12       6         0            0

How can I do this algorithmically?我怎样才能在算法上做到这一点？

Answer 1

Use df.index with df.iloc and df.idxmax :将df.index与df.iloc和df.idxmax ：

In [182]: period = 4 # Define period to 4
In [183]: ix = df[df.boolean.eq(1)].index # Create a list of indexes where boolean = 1

In [213]: new_bool_ix = [] # empty list

# For every index in `ix`, take the last 4 rows and append the index of maximum `value`
In [215]: for i in ix:
     ...:     new_bool_ix.append(df.iloc[:i + 1].iloc[-period:]['value'].idxmax()) 
     ...:  

In [225]: df['new_boolean'] = 0 # declare column new_boolean with default value `0`
In [227]: df.loc[new_bool_ix, 'new_boolean'] = 1 # Change the value to 1 for the indexes in new_bool_ix

In [228]: df
Out[228]: 
    value  boolean  new_boolean
0       1        0            0
1       5        1            1
2       0        0            0
3       3        0            0
4       9        1            1
5      12        0            1
6       4        0            0
7       7        1            0
8       8        1            0
9       2        0            0
10     17        0            1
11     15        1            0
12      6        0            0

Answer 2

Compute the rolling max of the 'value' column计算“值”列的滚动最大值

>>> rolling_max_value = df.rolling(window=4, min_periods=1)['value'].max()
>>> rolling_max_value 

0      1.0
1      5.0
2      5.0
3      5.0
4      9.0
5     12.0
6     12.0
7     12.0
8     12.0
9      8.0
10    17.0
11    17.0
12    17.0
Name: value, dtype: float64

Select only the relevant values, ie where 'boolean' = 1仅选择相关值，即其中 'boolean' = 1

>>> on_values = rolling_max_value[df.boolean == 1].unique()
>>> on_values

array([ 5.,  9., 12., 17.])

The rows where 'new_boolean' = 1 are the ones where 'value' belongs to on_values 'new_boolean' = 1 的行是 'value' 属于on_values

>>> df['new_boolean'] = df.value.isin(on_values).astype(int)
>>> df

    value  boolean  new_boolean
0       1        0            0
1       5        1            1
2       0        0            0
3       3        0            0
4       9        1            1
5      12        0            1
6       4        0            0
7       7        1            0
8       8        1            0
9       2        0            0
10     17        0            1
11     15        1            0
12      6        0            0

Answer 3

I did this in 2 steps, but I think the solution is much clearer:我分两步完成，但我认为解决方案更清晰：

df = pd.read_csv(StringIO('''
id value  boolean
0        1        0
1        5        1
2        0        0
3        3        0
4        9        1
5       12        0
6        4        0
7        7        1
8        8        1
9        2        0
10      17        0
11      15        1
12       6        0'''),delim_whitespace=True,index_col=0)

df['new_bool'] = df['value'].rolling(min_periods=1, window=4).max()
df['new_bool'] = df.apply(lambda x: 1 if ((x['value'] == x['new_bool']) & (x['boolean'] == 1)) else 0, axis=1)
df

Result:结果：

    value   boolean new_bool
id          
0   1   0   0
1   5   1   1
2   0   0   0
3   3   0   0
4   9   1   1
5   12  0   0
6   4   0   0
7   7   1   0
8   8   1   0
9   2   0   0
10  17  0   0
11  15  1   0
12  6   0   0

Pandas 在条件下添加列：如果单元格的值为 True，则将 Period 中最大数字的值设置为 true

问题描述

2 个解决方案

解决方案1
1 2021-10-23 16:51:04

解决方案2
0 2021-10-23 18:29:39

解决方案3
-1 2021-10-23 16:55:24

Pandas 在条件下添加列：如果单元格的值为 True，则将 Period 中最大数字的值设置为 true

问题描述

2 个解决方案

解决方案1 1 2021-10-23 16:51:04

解决方案2 0 2021-10-23 18:29:39

解决方案3 -1 2021-10-23 16:55:24

解决方案1
1 2021-10-23 16:51:04

解决方案2
0 2021-10-23 18:29:39

解决方案3
-1 2021-10-23 16:55:24