[英]Pandas: Create new column with repeating values based on non-repeating values in another column
I have a dataframe with the following column the follows this format:我有一个 dataframe 以下列格式如下:
df = pd.DataFrame(data={
'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623]
'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
})
I want to create a new column that takes the values from 'value' and repeats it the number of times downward from 'repeatVal' so the output looks like 'result':我想创建一个新列,从“值”中获取值并从“repeatVal”向下重复它的次数,因此 output 看起来像“结果”:
df = pd.DataFrame(data={
'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623]
'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
'result': ['NaN', 456, 456, 'NaN', 121, 121, 121, 'NaN', 'NaN']
})
To be clear, I do not want to duplicate the rows, I only want to create a new col where values are repeated n times, where n is specified in a different column.明确地说,我不想复制行,我只想创建一个新的列,其中值重复 n 次,其中 n 在不同的列中指定。 The format of the column 'repeatVals' is such that there will never be overlap--that there will always be sufficient NaN values between the repeat indicators in 'repeatVals'
“repeatVals”列的格式永远不会重叠——“repeatVals”中的重复指示符之间始终有足够的 NaN 值
I have read the docs on np.repeat and np.tile but those don't appear to solve this issue.我已阅读 np.repeat 和 np.tile 上的文档,但这些文档似乎无法解决此问题。
One option using groupby.cumcount
as masks:使用
groupby.cumcount
作为掩码的一种选择:
df = df.replace('NaN', float('nan'))
m1 = df['repeatVal'].notna()
m2 = df.groupby(m1.cumsum()).cumcount().lt(df['repeatVal'].ffill())
df['result'] = df['value'].where(m1).ffill().where(m2)
Output: Output:
value repeatVal result
0 123 NaN NaN
1 456 2.0 456.0
2 789 NaN 456.0
3 111 NaN NaN
4 121 3.0 121.0
5 34523 NaN 121.0
6 4352 NaN 121.0
7 45343 NaN NaN
8 623 NaN NaN
Intermediates:中间体:
value repeatVal result m1 m1.cumsum() cumcount cumcount < repeatVal.ffill() value/masked/ffill
0 123 NaN NaN False 0 0 False NaN
1 456 2.0 456.0 True 1 0 True 456.0
2 789 NaN 456.0 False 1 1 True 456.0
3 111 NaN NaN False 1 2 False 456.0
4 121 3.0 121.0 True 2 0 True 121.0
5 34523 NaN 121.0 False 2 1 True 121.0
6 4352 NaN 121.0 False 2 2 True 121.0
7 45343 NaN NaN False 2 3 False 121.0
8 623 NaN NaN False 2 4 False 121.0
Here is a way using index.repeat
这是一种使用
index.repeat
的方法
((v := df.loc[df.index.repeat(df['repeatVal'].fillna(0)),'value'])
.set_axis(v.groupby(v).cumcount() + v.index))
Output: Output:
1 456
2 456
4 121
5 121
6 121
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.