Pandas：根据另一列中的非重复值创建具有重复值的新列

Question

I have a dataframe with the following column the follows this format:我有一个 dataframe 以下列格式如下：

df = pd.DataFrame(data={
  'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623] 
  'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
})

I want to create a new column that takes the values from 'value' and repeats it the number of times downward from 'repeatVal' so the output looks like 'result':我想创建一个新列，从“值”中获取值并从“repeatVal”向下重复它的次数，因此 output 看起来像“结果”：

df = pd.DataFrame(data={
  'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623] 
  'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
  'result': ['NaN', 456, 456, 'NaN', 121, 121, 121, 'NaN', 'NaN']
})

To be clear, I do not want to duplicate the rows, I only want to create a new col where values are repeated n times, where n is specified in a different column.明确地说，我不想复制行，我只想创建一个新的列，其中值重复 n 次，其中 n 在不同的列中指定。 The format of the column 'repeatVals' is such that there will never be overlap--that there will always be sufficient NaN values between the repeat indicators in 'repeatVals' “repeatVals”列的格式永远不会重叠——“repeatVals”中的重复指示符之间始终有足够的 NaN 值

I have read the docs on np.repeat and np.tile but those don't appear to solve this issue.我已阅读 np.repeat 和 np.tile 上的文档，但这些文档似乎无法解决此问题。

Answer 1

One option using groupby.cumcount as masks:使用groupby.cumcount作为掩码的一种选择：

df = df.replace('NaN', float('nan'))

m1 = df['repeatVal'].notna()
m2 = df.groupby(m1.cumsum()).cumcount().lt(df['repeatVal'].ffill())
df['result'] = df['value'].where(m1).ffill().where(m2)

Output: Output：

   value  repeatVal  result
0    123        NaN     NaN
1    456        2.0   456.0
2    789        NaN   456.0
3    111        NaN     NaN
4    121        3.0   121.0
5  34523        NaN   121.0
6   4352        NaN   121.0
7  45343        NaN     NaN
8    623        NaN     NaN

Intermediates:中间体：

   value  repeatVal  result     m1  m1.cumsum()  cumcount  cumcount < repeatVal.ffill()  value/masked/ffill
0    123        NaN     NaN  False            0         0                         False                 NaN
1    456        2.0   456.0   True            1         0                          True               456.0
2    789        NaN   456.0  False            1         1                          True               456.0
3    111        NaN     NaN  False            1         2                         False               456.0
4    121        3.0   121.0   True            2         0                          True               121.0
5  34523        NaN   121.0  False            2         1                          True               121.0
6   4352        NaN   121.0  False            2         2                          True               121.0
7  45343        NaN     NaN  False            2         3                         False               121.0
8    623        NaN     NaN  False            2         4                         False               121.0

Answer 2

Here is a way using index.repeat这是一种使用index.repeat的方法

((v := df.loc[df.index.repeat(df['repeatVal'].fillna(0)),'value'])
.set_axis(v.groupby(v).cumcount() + v.index))

Output: Output：

Pandas：根据另一列中的非重复值创建具有重复值的新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2023-01-12 19:02:29

解决方案2
0 2023-01-12 20:32:24

Pandas：根据另一列中的非重复值创建具有重复值的新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2023-01-12 19:02:29

解决方案2 0 2023-01-12 20:32:24

解决方案1
2 已采纳 2023-01-12 19:02:29

解决方案2
0 2023-01-12 20:32:24