简体   繁体   English

Pandas 在条件下添加列:如果单元格的值为 True,则将 Period 中最大数字的值设置为 true

[英]Pandas add column on condition: If value of cell is True set value of largest number in Period to true

I have a pandas dataframe with lets say two columns, for example:我有一个 Pandas 数据框,可以说两列,例如:

     value  boolean
0        1        0
1        5        1
2        0        0
3        3        0
4        9        1
5       12        0
6        4        0
7        7        1
8        8        1
9        2        0
10      17        0
11      15        1
12       6        0

Now I want to add a third column (new_boolean) with the following criteria: I specify a period, for this example period = 4. Now I take a look at all rows where boolean == 1. new_boolean will be 1 for the maximum value in the last period rows.现在我想添加具有以下条件的第三列 (new_boolean):我指定一个句点,对于此示例,句点 = 4。现在我查看所有布尔值 == 1 的行。对于最大值,new_boolean 将为 1在最后一期行中。

For example I have boolean == 1 for row 2. So I look at the last period rows.例如,我的第 2 行有 boolean == 1。所以我查看最后一期的行。 The values are [1, 5], 5 is the maximum, so the value for new_boolean in row 2 will be one.值为 [1, 5],5 是最大值,因此第 2 行中 new_boolean 的值为 1。

Second example: row 8 (value = 7): I get values [7, 4, 12, 9], 12 is the maximum, so the value for new_boolean in the row with value 12 will be 1第二个示例:第 8 行(值 = 7):我得到值 [7, 4, 12, 9],12 是最大值,因此值为 12 的行中 new_boolean 的值将为 1

result:结果:

     value   boolean  new_boolean
0        1         0            0
1        5         1            1
2        0         0            0
3        3         0            0
4        9         1            1
5       12         0            1
6        4         0            0
7        7         1            0
8        8         1            0
9        2         0            0
10      17         0            1
11      15         1            0
12       6         0            0

在此处输入图片说明

How can I do this algorithmically?我怎样才能在算法上做到这一点?

Use df.index with df.iloc and df.idxmax :df.indexdf.ilocdf.idxmax

In [182]: period = 4 # Define period to 4
In [183]: ix = df[df.boolean.eq(1)].index # Create a list of indexes where boolean = 1

In [213]: new_bool_ix = [] # empty list

# For every index in `ix`, take the last 4 rows and append the index of maximum `value`
In [215]: for i in ix:
     ...:     new_bool_ix.append(df.iloc[:i + 1].iloc[-period:]['value'].idxmax()) 
     ...:  

In [225]: df['new_boolean'] = 0 # declare column new_boolean with default value `0`
In [227]: df.loc[new_bool_ix, 'new_boolean'] = 1 # Change the value to 1 for the indexes in new_bool_ix

In [228]: df
Out[228]: 
    value  boolean  new_boolean
0       1        0            0
1       5        1            1
2       0        0            0
3       3        0            0
4       9        1            1
5      12        0            1
6       4        0            0
7       7        1            0
8       8        1            0
9       2        0            0
10     17        0            1
11     15        1            0
12      6        0            0

Compute the rolling max of the 'value' column计算“值”列的滚动最大值

>>> rolling_max_value = df.rolling(window=4, min_periods=1)['value'].max()
>>> rolling_max_value 

0      1.0
1      5.0
2      5.0
3      5.0
4      9.0
5     12.0
6     12.0
7     12.0
8     12.0
9      8.0
10    17.0
11    17.0
12    17.0
Name: value, dtype: float64

Select only the relevant values, ie where 'boolean' = 1仅选择相关值,即其中 'boolean' = 1

>>> on_values = rolling_max_value[df.boolean == 1].unique()
>>> on_values

array([ 5.,  9., 12., 17.])

The rows where 'new_boolean' = 1 are the ones where 'value' belongs to on_values 'new_boolean' = 1 的行是 'value' 属于on_values

>>> df['new_boolean'] = df.value.isin(on_values).astype(int)
>>> df

    value  boolean  new_boolean
0       1        0            0
1       5        1            1
2       0        0            0
3       3        0            0
4       9        1            1
5      12        0            1
6       4        0            0
7       7        1            0
8       8        1            0
9       2        0            0
10     17        0            1
11     15        1            0
12      6        0            0

I did this in 2 steps, but I think the solution is much clearer:我分两步完成,但我认为解决方案更清晰:

df = pd.read_csv(StringIO('''
id value  boolean
0        1        0
1        5        1
2        0        0
3        3        0
4        9        1
5       12        0
6        4        0
7        7        1
8        8        1
9        2        0
10      17        0
11      15        1
12       6        0'''),delim_whitespace=True,index_col=0)

df['new_bool'] = df['value'].rolling(min_periods=1, window=4).max()
df['new_bool'] = df.apply(lambda x: 1 if ((x['value'] == x['new_bool']) & (x['boolean'] == 1)) else 0, axis=1)
df

Result:结果:

    value   boolean new_bool
id          
0   1   0   0
1   5   1   1
2   0   0   0
3   3   0   0
4   9   1   1
5   12  0   0
6   4   0   0
7   7   1   0
8   8   1   0
9   2   0   0
10  17  0   0
11  15  1   0
12  6   0   0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果 pandas 的条件为真,则将列值设置为下一行的值 - Set column value to the value of the next row if a condition is true for pandas Python Pandas:如果条件为真,则将现有列值放入新列 - Python Pandas: if condition is true, put existing column value into new column 如果大熊猫包含条件为真,如何返回值(数字)? - How to return a value (number) if pandas contains condition is true? 仅当条件为true时,熊猫才会替换数据框中的列的值 - Pandas replace the value of a column in dataframe only where if condition is true 如果相邻单元格中的条件为真,则更新值 - Update value if a condition in the adjacent cell is true Pandas,检查列中的单元格是否为数字,如果为真,则用另一列中单元格的值减去完全相同的单元格 - Pandas, check if the cell in a column is digit, if true then subtract that exact same cell with the value of cell in another column 为列中每个 True 值的子块添加一个序列号 - add a sequence number to every sub-block of True value in column 如果满足另一个条件,也将 True 之后的值设置为 True - Set value after True to True as well if another condition is meet Pandas:无法根据其他列上的条件设置单元格值 - Pandas: Not able to set a cell value based on a condition on other column 如果条件为真,则在 Pandas 中创建一个新列并从现有列中分配值 - create a new column in pandas and assign value from existing column if condition is true
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM