[英]How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?
Disclaimer: This might be possible duplicate but I cannot find the exact solution.免责声明:这可能是重复的,但我找不到确切的解决方案。 Please feel free to mark this question as duplicate and provide link to duplicate question in comments.请随时将此问题标记为重复,并在评论中提供重复问题的链接。
I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.我仍在学习 python dataframe 操作,这可能有一个非常简单的解决方案,我无法弄清楚。
I have a python dataframe with a single columns.我有一个带有单列的 python dataframe。 Now I want to change value of each row to value of previous row if certain conditions are satisfied.现在,如果满足某些条件,我想将每行的值更改为前一行的值。 I have created a loop solution to implement this but I was hoping for a more efficient solution.我创建了一个循环解决方案来实现这一点,但我希望有一个更有效的解决方案。
Creation of initial data:创建初始数据:
import numpy as np
import pandas as pd
data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])
print(df)
random_numbers
0 6
1 24
2 29
3 18
4 22
5 17
6 12
7 7
8 6
9 27
10 29
11 13
12 23
13 6
14 25
15 24
16 16
17 15
18 25
19 19
Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value.现在假设两个条件是 1) 值小于 10 和 2) 值大于 20。在任何这些情况下,将行值设置为前一行值。 This has been implement in loop format as follows:这已以循环格式实现,如下所示:
for index,row in df.iterrows():
if index == 0:
continue;
if(row.random_numbers<10):
df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
if(row.random_numbers>20):
df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
random_numbers
0 6
1 6
2 6
3 18
4 18
5 17
6 12
7 12
8 12
9 12
10 12
11 13
12 13
13 13
14 13
15 13
16 16
17 15
18 15
19 19
Please suggest a more efficient way to implement this logic as I am using large number of rows.当我使用大量行时,请提出一种更有效的方法来实现此逻辑。
You can replace the values less than 10 and values more than 20 with NaN
then use pandas.DataFrame.ffill() to fill nan with previous row value.您可以用NaN
替换小于 10 和大于 20 的值,然后使用pandas.DataFrame.ffill()用前一行值填充 nan。
mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)
# Since you escape with `if index == 0:`
mask[df.index[0]] = False
df.loc[mask, 'random_numbers'] = np.nan
df['random_numbers'].ffill(inplace=True)
# Original
random_numbers
0 7
1 28
2 8
3 14
4 12
5 20
6 21
7 11
8 16
9 27
10 19
11 23
12 18
13 5
14 6
15 11
16 6
17 8
18 17
19 8
# After replaced
random_numbers
0 7.0
1 7.0
2 7.0
3 14.0
4 12.0
5 20.0
6 20.0
7 11.0
8 16.0
9 16.0
10 19.0
11 19.0
12 18.0
13 18.0
14 18.0
15 11.0
16 11.0
17 11.0
18 17.0
19 17.0
We can also do it in a simpler way by using .mask()
together with .ffill()
and slicing on [1:]
as follows:我们还可以通过将.mask()
与.ffill()
() 一起使用并在[1:]
上切片来以更简单的方式进行操作,如下所示:
df['random_numbers'][1:] = df['random_numbers'][1:].mask((df['random_numbers'] < 10) | (df['random_numbers'] > 20))
df['random_numbers'] = df['random_numbers'].ffill(downcast='infer')
.mask()
tests for the condition and replace with NaN
when the condition is true (default to replace with NaN
if the parameter other=
is not supplied). .mask()
测试条件并在条件为真时替换为NaN
(如果未提供参数other=
,则默认替换为NaN
)。 Retains the original values for rows with condition not met.保留未满足条件的行的原始值。
Note that the resulting numbers are maintained as integer
instead of transformed unexpectedly to float
type by supplying the downcast='infer'
in the call to .ffill()
.请注意,结果数字保持为integer
,而不是通过在对.ffill()
的调用中提供downcast='infer'
意外转换为float
类型。
We use [1:]
on the first line to ensure the data on row 0
is untouched without transformation.我们在第一行使用[1:]
来确保第0
行的数据在没有转换的情况下保持不变。
# Original data: (reusing your sample data)
random_numbers
0 6
1 24
2 29
3 18
4 22
5 17
6 12
7 7
8 6
9 27
10 29
11 13
12 23
13 6
14 25
15 24
16 16
17 15
18 25
19 19
# After transposition:
random_numbers
0 6
1 6
2 6
3 18
4 18
5 17
6 12
7 12
8 12
9 12
10 12
11 13
12 13
13 13
14 13
15 13
16 16
17 15
18 15
19 19
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.