简体   繁体   English

根据列值创建具有重复行的 dataframe

[英]Create dataframe with Repeated rows based on column value

I am trying to expand out a dataset that has two columns and expand it out in python.我正在尝试扩展一个包含两列的数据集,并将其扩展到 python。

Basket        | Times 
______________|_______
Bread         | 5     
Orange, Bread | 3     

I would like, based on the number in the Times column that many rows.我想,基于 Times 列中的数字,有多少行。 So for the example above所以对于上面的例子

Newcolumn  
_______ 
Bread1
Bread2
Bread3
Bread4
Bread5   
Orange, Bread1
Orange, Bread2
Orange, Bread3  

Use np.repeat to repeat each value the required number of times.使用np.repeat将每个值重复所需的次数。 Then groupby and cumcount to add the required suffixes:然后groupbycumcount添加需要的后缀:

import numpy as np
srs = np.repeat(df["Basket"],df["Times"])

output = (srs+srs.groupby(level=0).cumcount().add(1).astype(str)).reset_index(drop=True)

>>> output
0            Bread1
1            Bread2
2            Bread3
3            Bread4
4            Bread5
5    Orange, Bread1
6    Orange, Bread2
7    Orange, Bread3
dtype: object

You can try apply on rows to generate desired list and explode the column您可以尝试在行上apply以生成所需的列表并explode

df['Newcolumn'] = df.apply(lambda row: [f"{row['Basket']}_{i+1}" for i in range(row['Times'])], axis=1)
df = df.explode('Newcolumn', ignore_index=True)
print(df)

          Basket  Times        Newcolumn
0          Bread      5          Bread_1
1          Bread      5          Bread_2
2          Bread      5          Bread_3
3          Bread      5          Bread_4
4          Bread      5          Bread_5
5  Orange, Bread      3  Orange, Bread_1
6  Orange, Bread      3  Orange, Bread_2
7  Orange, Bread      3  Orange, Bread_3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM