简体   繁体   中英

Pandas duplicating row if condition met, and assigning value

I have a Pandas dataframe like the one below, where column A is a series of string values, and column B maintains a running total of the number of times the value in column A differs from the value of column A in the previous row.

A    B       
1    1          
1    1             
1b   2          
1b   2                
1b   2    
1    3   

Every time there is a change in the value of column A, I would like to duplicate the preceding row and assign it an incremented value of column B. For example, with the input dataframe as above, the output would look like:

A    B       
1    1          
1    1   
1    2            
1b   2          
1b   2                
1b   2 
1b   3    
1    3   

Any thoughts about how to go about this in an efficient way?

Filter last duplicated values by B , then shifting only B and assign back, remove last row and last join togehter by concat with sorting by index:

df1 = (df[df['B'].ne(df['B'].shift(-1))]
         .assign(B = lambda x: x.B.shift(-1)).iloc[:-1].astype({'B':int}))

df = pd.concat([df, df1]).sort_index(ignore_index=True)
print (df)
    A  B
0   1  1
1   1  1
2   1  2
3  1b  2
4  1b  2
5  1b  2
6  1b  3
7   1  3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM