简体   繁体   中英

Expand counted row value into separate rows, adding distinct ID in python

I have a dataset that has several rows and columns, however within the column labeled, 'number', I wish to remove the aggregation and separate this into its own unique count. I also wish to add a column that gives this count a unique id.

Data

location    name    type    number  year
ny          hello   he      1       2021
ny          bye     by      0       2021
ny          ok      o       2       2021
ca          hi      h       1       2021

Desired

location    name    type    number  year    count
ny          hello   he      1       2021    he1
ny          bye     by      0       2021    by1
ny          ok      o       1       2021    o1
ny          ok      o       1       2021    o2
ca          hi      h       1       2021    h1

The string 'ok' is now separated into distinct rows, versus being aggregated with a value of 2. The values in the 'number' column are now separated into 2 distinct rows, along with a distinct count ID (based on the 'name' column) instead of an aggregation.

Doing

df = df1.reindex(df1.index.repeat(df1['number'])).assign(number=1)
df['count'] = df['type'] + '0' + (df.groupby(['location', 'name', 'type', 'number', 'year']).cumcount() + 1).astype(str)
df

I was helped by a SO member, however, in this example, how would I account for if values in the number column is 0? I am still researching this.

Any suggestion or advice is appreciated

Idea is split values for repeat only of number is greater like 1 , then add rows with number=0,1 and sorting for original ordering:

m = df1['number'].gt(1)
df2 = df1[m]
df = (pd.concat([df2.reindex(df2.index.repeat(df2['number'])).assign(number=1),
                 df1[~m]]).sort_index())

df['count'] = df['type'] + '0' + (df.groupby(['location', 'name', 'type', 'number', 'year']).cumcount() + 1).astype(str)

print (df)
  location   name type  number  year count
0       ny  hello   he       1  2021  he01
1       ny    bye   by       0  2021  by01
2       ny     ok    o       1  2021   o01
2       ny     ok    o       1  2021   o02
3       ca     hi    h       1  2021   h01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM