简体   繁体   English

如何根据多个条件根据前一行填充 pandas dataframe 列的行?

[英]How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Disclaimer: This might be possible duplicate but I cannot find the exact solution.免责声明:这可能是重复的,但我找不到确切的解决方案。 Please feel free to mark this question as duplicate and provide link to duplicate question in comments.请随时将此问题标记为重复,并在评论中提供重复问题的链接。

I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.我仍在学习 python dataframe 操作,这可能有一个非常简单的解决方案,我无法弄清楚。

I have a python dataframe with a single columns.我有一个带有单列的 python dataframe。 Now I want to change value of each row to value of previous row if certain conditions are satisfied.现在,如果满足某些条件,我想将每行的值更改为前一行的值。 I have created a loop solution to implement this but I was hoping for a more efficient solution.我创建了一个循环解决方案来实现这一点,但我希望有一个更有效的解决方案。

Creation of initial data:创建初始数据:

import numpy as np
import pandas as pd

data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])

print(df)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19

Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value.现在假设两个条件是 1) 值小于 10 和 2) 值大于 20。在任何这些情况下,将行值设置为前一行值。 This has been implement in loop format as follows:这已以循环格式实现,如下所示:

for index,row in df.iterrows():
    if index == 0:
        continue;
    if(row.random_numbers<10):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
    if(row.random_numbers>20):
        df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

Please suggest a more efficient way to implement this logic as I am using large number of rows.当我使用大量行时,请提出一种更有效的方法来实现此逻辑。

You can replace the values less than 10 and values more than 20 with NaN then use pandas.DataFrame.ffill() to fill nan with previous row value.您可以用NaN替换小于 10 和大于 20 的值,然后使用pandas.DataFrame.ffill()用前一行值填充 nan。

mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)

# Since you escape with `if index == 0:`
mask[df.index[0]] = False

df.loc[mask, 'random_numbers'] = np.nan

df['random_numbers'].ffill(inplace=True)
# Original

    random_numbers
0                7
1               28
2                8
3               14
4               12
5               20
6               21
7               11
8               16
9               27
10              19
11              23
12              18
13               5
14               6
15              11
16               6
17               8
18              17
19               8

# After replaced

    random_numbers
0              7.0
1              7.0
2              7.0
3             14.0
4             12.0
5             20.0
6             20.0
7             11.0
8             16.0
9             16.0
10            19.0
11            19.0
12            18.0
13            18.0
14            18.0
15            11.0
16            11.0
17            11.0
18            17.0
19            17.0

We can also do it in a simpler way by using .mask() together with .ffill() and slicing on [1:] as follows:我们还可以通过将.mask().ffill() () 一起使用并在[1:]上切片来以更简单的方式进行操作,如下所示:

df['random_numbers'][1:] = df['random_numbers'][1:].mask((df['random_numbers'] < 10) | (df['random_numbers'] > 20))

df['random_numbers'] = df['random_numbers'].ffill(downcast='infer')

.mask() tests for the condition and replace with NaN when the condition is true (default to replace with NaN if the parameter other= is not supplied). .mask()测试条件并在条件为真时替换为NaN (如果未提供参数other= ,则默认替换为NaN )。 Retains the original values for rows with condition not met.保留未满足条件的行的原始值。

Note that the resulting numbers are maintained as integer instead of transformed unexpectedly to float type by supplying the downcast='infer' in the call to .ffill() .请注意,结果数字保持为integer ,而不是通过在对.ffill()的调用中提供downcast='infer'意外转换为float类型。

We use [1:] on the first line to ensure the data on row 0 is untouched without transformation.我们在第一行使用[1:]来确保第0行的数据在没有转换的情况下保持不变。

# Original data:  (reusing your sample data)

    random_numbers
0                6
1               24
2               29
3               18
4               22
5               17
6               12
7                7
8                6
9               27
10              29
11              13
12              23
13               6
14              25
15              24
16              16
17              15
18              25
19              19


# After transposition:

    random_numbers
0                6
1                6
2                6
3               18
4               18
5               17
6               12
7               12
8               12
9               12
10              12
11              13
12              13
13              13
14              13
15              13
16              16
17              15
18              15
19              19

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据先前的行和列条件填充 pandas dataframe 的行? - How to populate row of pandas dataframe based on previous row and column condition? 根据多个条件和以前的行值更改列熊猫 - Changing column based on multiple conditions and previous rows values pandas 根据行和列条件保留 pandas dataframe 的行 - Keep rows of a pandas dataframe based on both row and column conditions 如何使用基于上一行和下一行的条件在 Pandas Dataframe 上创建新列? - How can I create a new column on a Pandas Dataframe with conditions based on previous and next row? 如何基于多个列和条件填充pandas DataFrame? - How to populate pandas DataFrame based on multiple columns and conditions? 在 Pandas 数据框中在多个条件下(基于 2 列)删除行 - Drop rows on multiple conditions (based on 2 column) in pandas dataframe 如何根据同一列的多个条件过滤掉 pandas.DataFrame 中的多行 - How to filter out multiple rows in a pandas.DataFrame based on multiple conditions for the same column pandas dataframe 列基于之前的行 - pandas dataframe column based on previous rows 根据多个条件合并 Pandas Dataframe 行 - Merge Pandas Dataframe Rows based on multiple conditions 如何根据pandas数据框中的多列值条件排除行? - How to exclude rows based on multi column value conditions in pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM