简体   繁体   中英

Pandas renaming all consecutive rows based on condition

I have dataframe similar to that:

在此处输入图片说明

You can recreate it using this code:

import pandas as pd
df = pd.DataFrame({
    'A' : 1.,
    'name' :  pd.Categorical(["hello","hello","hello","hello"]),
    'col_2' : pd.Categorical(["2","2","12","Nan"]),
    'col_3' : pd.Categorical(["11","1","3","Nan"])})

I would like to change the value of "name" in each row with "col_2" or "col_3" higher than 10.

So, if there is a number higher than 10 in "col_2" or in "col_3", all rows up to the next number that is higher than 10 should be renamed.

Here is what it should look like in the end:

在此处输入图片说明

You can achieve it with cumsum

name_index = df[['col_2', 'col_3']]\
    .apply(pd.to_numeric, errors='coerce')\ 
    .ge(10)\
    .any(axis=1)\
    .cumsum()
df['name'] = df['name'].astype(str) + '_' + name_index.astype(str)
print(df)

    A    col_2  col_3   name
0   1.0  2      11      hello_1
1   1.0  2      1       hello_1
2   1.0  12     3       hello_2
3   1.0  NaN    NaN     hello_2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM