简体   繁体   中英

Tagging a Similar category with repeated sequence of numbers in pandas dataframe

Below is the reproducible code

colo = ['red', 'red', 'red','cross','cross','red', 'red', 'red','cross','cross','cross',
            'cross','cross', 'red', 'red','cross', 'red','cross','cross']
    dt = pd.DataFrame()
    dt['seq']=[i for i in range(len(colo))]
    dt['col'] =  colo

Expected Output:

在此处输入图像描述

The column seq and col is been given & Expected_col is required to be created.

Here's one way using eq + diff + ne + cumsum to greate groups; then use boolean indexing to fill in values:

cond = dt['col'].eq('red')
s = dt.loc[cond, 'seq'].diff().ne(1).cumsum()
dt['Expected_col'] = dt['col']
dt.loc[cond, 'Expected_col'] = 'RED' + (s.max() + 1 - s).astype(str)

Output:

    seq    col Expected_col
0     0    red         RED4
1     1    red         RED4
2     2    red         RED4
3     3  cross        cross
4     4  cross        cross
5     5    red         RED3
6     6    red         RED3
7     7    red         RED3
8     8  cross        cross
9     9  cross        cross
10   10  cross        cross
11   11  cross        cross
12   12  cross        cross
13   13    red         RED2
14   14    red         RED2
15   15  cross        cross
16   16    red         RED1
17   17  cross        cross
18   18  cross        cross

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM