简体   繁体   中英

Create dataframe with Repeated rows based on column value

I am trying to expand out a dataset that has two columns and expand it out in python.

Basket        | Times 
______________|_______
Bread         | 5     
Orange, Bread | 3     

I would like, based on the number in the Times column that many rows. So for the example above

Newcolumn  
_______ 
Bread1
Bread2
Bread3
Bread4
Bread5   
Orange, Bread1
Orange, Bread2
Orange, Bread3  

Use np.repeat to repeat each value the required number of times. Then groupby and cumcount to add the required suffixes:

import numpy as np
srs = np.repeat(df["Basket"],df["Times"])

output = (srs+srs.groupby(level=0).cumcount().add(1).astype(str)).reset_index(drop=True)

>>> output
0            Bread1
1            Bread2
2            Bread3
3            Bread4
4            Bread5
5    Orange, Bread1
6    Orange, Bread2
7    Orange, Bread3
dtype: object

You can try apply on rows to generate desired list and explode the column

df['Newcolumn'] = df.apply(lambda row: [f"{row['Basket']}_{i+1}" for i in range(row['Times'])], axis=1)
df = df.explode('Newcolumn', ignore_index=True)
print(df)

          Basket  Times        Newcolumn
0          Bread      5          Bread_1
1          Bread      5          Bread_2
2          Bread      5          Bread_3
3          Bread      5          Bread_4
4          Bread      5          Bread_5
5  Orange, Bread      3  Orange, Bread_1
6  Orange, Bread      3  Orange, Bread_2
7  Orange, Bread      3  Orange, Bread_3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM