简体   繁体   English

Pandas:根据另一列中的非重复值创建具有重复值的新列

[英]Pandas: Create new column with repeating values based on non-repeating values in another column

I have a dataframe with the following column the follows this format:我有一个 dataframe 以下列格式如下:

df = pd.DataFrame(data={
  'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623] 
  'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
})

I want to create a new column that takes the values from 'value' and repeats it the number of times downward from 'repeatVal' so the output looks like 'result':我想创建一个新列,从“值”中获取值并从“repeatVal”向下重复它的次数,因此 output 看起来像“结果”:

df = pd.DataFrame(data={
  'value': [123, 456, 789, 111, 121, 34523, 4352, 45343, 623] 
  'repeatVal': ['NaN', 2, 'NaN', 'NaN', 3, 'NaN', 'NaN', 'NaN', 'NaN'],
  'result': ['NaN', 456, 456, 'NaN', 121, 121, 121, 'NaN', 'NaN']
})

To be clear, I do not want to duplicate the rows, I only want to create a new col where values are repeated n times, where n is specified in a different column.明确地说,我不想复制行,我只想创建一个新的列,其中值重复 n 次,其中 n 在不同的列中指定。 The format of the column 'repeatVals' is such that there will never be overlap--that there will always be sufficient NaN values between the repeat indicators in 'repeatVals' “repeatVals”列的格式永远不会重叠——“repeatVals”中的重复指示符之间始终有足够的 NaN 值

I have read the docs on np.repeat and np.tile but those don't appear to solve this issue.我已阅读 np.repeat 和 np.tile 上的文档,但这些文档似乎无法解决此问题。

One option using groupby.cumcount as masks:使用groupby.cumcount作为掩码的一种选择:

df = df.replace('NaN', float('nan'))

m1 = df['repeatVal'].notna()
m2 = df.groupby(m1.cumsum()).cumcount().lt(df['repeatVal'].ffill())
df['result'] = df['value'].where(m1).ffill().where(m2)

Output: Output:

   value  repeatVal  result
0    123        NaN     NaN
1    456        2.0   456.0
2    789        NaN   456.0
3    111        NaN     NaN
4    121        3.0   121.0
5  34523        NaN   121.0
6   4352        NaN   121.0
7  45343        NaN     NaN
8    623        NaN     NaN

Intermediates:中间体:

   value  repeatVal  result     m1  m1.cumsum()  cumcount  cumcount < repeatVal.ffill()  value/masked/ffill
0    123        NaN     NaN  False            0         0                         False                 NaN
1    456        2.0   456.0   True            1         0                          True               456.0
2    789        NaN   456.0  False            1         1                          True               456.0
3    111        NaN     NaN  False            1         2                         False               456.0
4    121        3.0   121.0   True            2         0                          True               121.0
5  34523        NaN   121.0  False            2         1                          True               121.0
6   4352        NaN   121.0  False            2         2                          True               121.0
7  45343        NaN     NaN  False            2         3                         False               121.0
8    623        NaN     NaN  False            2         4                         False               121.0

Here is a way using index.repeat这是一种使用index.repeat的方法

((v := df.loc[df.index.repeat(df['repeatVal'].fillna(0)),'value'])
.set_axis(v.groupby(v).cumcount() + v.index))

Output: Output:

1    456
2    456
4    121
5    121
6    121

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 pandas groupby 根据另一个值的重复范围对列的行求和 - sum rows of column based on a repeating range of values in another with pandas groupby 根据列中的重复值重塑 Pandas dataframe - Reshaping Pandas dataframe based on repeating values in a column 如何根据具有重复值的列对 pandas 中的值进行排序 - How to sort values in pandas based on a column that has repeating values Python Pandas 函数根据另一列中的重复值将不同的值合并到一行中 - Python pandas function to concat into one row different values into one column based on repeating values in another 根据一列中的值重复序列将熊猫数据框分组 - Grouping pandas dataframe into groups based on a repeating sequence of values in one column 根据另一列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in another column 根据熊猫中另一列中相似值的分组来创建新列 - Create a new column based on Grouping of similar values in another column in pandas 如何创建具有重复值熊猫的列(不匹配的索引) - How to Create a column with repeating values pandas (mismatching indexes) 熊猫-具有重复值的列的外部联接 - Pandas - Outer Join on Column with Repeating Values 删除 Python Pandas 中的重复列值 - Remove repeating column values in Python Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM