[英]Create dataframe with Repeated rows based on column value
I am trying to expand out a dataset that has two columns and expand it out in python.我正在尝试扩展一个包含两列的数据集,并将其扩展到 python。
Basket | Times
______________|_______
Bread | 5
Orange, Bread | 3
I would like, based on the number in the Times column that many rows.我想,基于 Times 列中的数字,有多少行。 So for the example above
所以对于上面的例子
Newcolumn
_______
Bread1
Bread2
Bread3
Bread4
Bread5
Orange, Bread1
Orange, Bread2
Orange, Bread3
Use np.repeat
to repeat each value the required number of times.使用
np.repeat
将每个值重复所需的次数。 Then groupby
and cumcount
to add the required suffixes:然后
groupby
和cumcount
添加需要的后缀:
import numpy as np
srs = np.repeat(df["Basket"],df["Times"])
output = (srs+srs.groupby(level=0).cumcount().add(1).astype(str)).reset_index(drop=True)
>>> output
0 Bread1
1 Bread2
2 Bread3
3 Bread4
4 Bread5
5 Orange, Bread1
6 Orange, Bread2
7 Orange, Bread3
dtype: object
You can try apply
on rows to generate desired list and explode
the column您可以尝试在行上
apply
以生成所需的列表并explode
列
df['Newcolumn'] = df.apply(lambda row: [f"{row['Basket']}_{i+1}" for i in range(row['Times'])], axis=1)
df = df.explode('Newcolumn', ignore_index=True)
print(df)
Basket Times Newcolumn
0 Bread 5 Bread_1
1 Bread 5 Bread_2
2 Bread 5 Bread_3
3 Bread 5 Bread_4
4 Bread 5 Bread_5
5 Orange, Bread 3 Orange, Bread_1
6 Orange, Bread 3 Orange, Bread_2
7 Orange, Bread 3 Orange, Bread_3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.