[英]Repeating rows of a dataframe based on a column value
I have a data frame like this: 我有一个像这样的数据框:
df1 = pd.DataFrame({'a': [1,2],
'b': [3,4],
'c': [6,5]})
df1
Out[150]:
a b c
0 1 3 6
1 2 4 5
Now I want to create a df that repeats each row based on difference between col b and c plus 1. So diff between b and c for first row is 6-3 = 3. I want to repeat that row 3+1=4 times. 现在,我想创建一个基于col b和c加1的差重复每行的df。所以第一行的b和c之间的差是6-3 =3。我想将该行重复3 + 1 = 4次。 Similarly for second row the difference is 5-4 = 1, so I want to repeat it 1+1=2 times. 同样,对于第二行,差异为5-4 = 1,因此我想将其重复1 + 1 = 2次。 The column d is added to have value from min(b) to diff between b and c (ie6-3 = 3. So it goes from 3->6). 将列d添加为从min(b)到b与c之间的差异(即6-3 = 3)。因此它从3-> 6变为。 So I want to get this df: 所以我想得到这个df:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
Do it with reindex
+ repeat
, then using groupby
cumcount
assign the new value d 使用reindex
+ repeat
,然后使用groupby
cumcount
分配新值d
df1.reindex(df1.index.repeat(df1.eval('c-b').add(1))).\
assign(d=lambda x : x.c-x.groupby('a').cumcount(ascending=False))
Out[572]:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.