简体   繁体   English

如何使用多行和多列作为输入在DataFrame列上应用函数?

[英]How to apply a function on a DataFrame Column using multiple rows and columns as input?

I have a sequence of events, and based on some variables (previous command, previous/current code and previous/current status) I need to decide which command is related to that event. 我有一系列事件,并基于一些变量(上一个命令,前一个/当前代码和前一个/当前状态),我需要决定哪个命令与该事件相关。

I actually have a code that works as expected, but it's kind of slow. 我实际上有一个按预期工作的代码,但它有点慢。 So I've tried to use df.apply, but I don't think it's possible to use more than the current element as input. 所以我尝试使用df.apply,但我认为不可能使用比当前元素更多的输入。 (The code starts at 1 because the first row is always a "begin" command) (代码从1开始,因为第一行始终是“开始”命令)

def mark_commands(df):
    for i in range(1, len(df)):
        prev_command = df.loc[i-1, 'Command']
        prev_code, cur_code = df.loc[i-1, 'Code'], df.loc[i, 'Code']
        prev_status, cur_status = df.loc[i-1, 'Status'], df.loc[i, 'Status']

        if (prev_command == "end" and 
            ((cur_code == 810 and cur_status in [10, 15]) or 
            (cur_code == 830 and cur_status == 15))):

            df.loc[i, 'Command'] = "ignore"

        elif ((cur_code == 800 and cur_status in [20, 25]) or 
            (cur_code in [810, 830] and cur_status in [10, 15])):

            df.loc[i, 'Command'] = "end"

        elif ((prev_code != 800) and 
            ((cur_code == 820 and cur_status == 25) or 
            (cur_code == 820 and cur_status == 20 and 
                prev_code in [810, 820] and prev_status == 20) or 
            (cur_code == 830 and cur_status == 25 and 
                prev_code == 820 and prev_status == 20))):

            df.loc[i, 'Command'] = "continue"

        else:

            df.loc[i, 'Command'] = "begin"

    return df

And here is a correctly labeled sample in a CSV format (Which can serve as input, since the only difference is that everything on the command line is empty after the first begin): 这里有一个CSV格式的正确标记的样本(可以作为输入,因为唯一的区别是第一次开始后命令行上的所有内容都是空的):

Code,Status,Command
810,20,begin
810,10,end
810,25,begin
810,15,end
810,15,ignore
810,20,begin
810,10,end
810,25,begin
810,15,end
810,15,ignore
810,20,begin
800,20,end
810,10,ignore
810,25,begin
820,25,continue
820,25,continue
820,25,continue
820,25,continue
800,25,end

You're code is mostly perfect (you could have used df.iterrows() , more bulletproof if your index is not linear, in the for loop but it wouldn't have changed the speed). 你的代码大部分是完美的(你可以使用df.iterrows() ,如果你的索引不是线性的,那么在for循环中会更加防弹,但它不会改变速度)。

After trying extensively to use df.apply , I realized there was a fatal flow since your "Command" column is continuously updating from one row to another. 在广泛尝试使用df.apply ,我意识到由于您的"Command"列不断从一行更新到另一行,因此存在致命流。 The following wouldn't work since df is somehow "static": 以下是行不通的,因为df在某种程度上是“静态的”:

df['Command'] = df.apply(lambda row: mark_commands(row), axis=1)

Eventually, to save you some calculation, you could insert a continue statement each time a condition is met if your if , elif statements to go directly to the next iteration: 最后,为了节省一些计算,如果你的ifelif语句直接进入下一次迭代,你可以在每次条件满足时插入一个continue语句:

if (prev_command == "end" and ....) :
    df.loc[i, 'Command'] = "ignore"
    continue

That being said, your code works great. 话虽这么说,你的代码很棒。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将自定义函数应用于 dask 数据框中的组,使用多列作为函数输入 - How to apply a custom function to groups in a dask dataframe, using multiple columns as function input 根据来自其他列的值使用将 function 应用于多个列,在 dataframe 中创建新列 - Create new column into dataframe based on values from other columns using apply function onto multiple columns Pandas:如何将复杂的 function 应用于 dataframe 的列,并将另外两列作为 function 的输入? - Pandas:How to apply a complex function to a column of a dataframe, with two other columns as the input of the function? 熊猫:将函数应用于大型DataFrame的许多列以返回多行 - pandas: apply a function to the many columns of a large DataFrame to return multiple rows 熊猫数据框在创建多个列的列上应用功能 - pandas dataframe apply function over column creating multiple columns 将 function 应用于 dataframe 中的多个列并将结果存储在新列中 - Apply a function to multiple columns in dataframe and store result in a new column 对 dataframe 中的多个列应用一个 function - Apply a function for multiple columns in dataframe pandas - 在列上应用一个函数,使用第二个数据框作为函数的输入? - pandas - apply a function on column, using a second dataframe as an input for the function? 如何按行将函数应用于 PySpark 数据帧的一组列? - How to apply a function to a set of columns of a PySpark dataframe by rows? 使用 apply 将 DataFrame 列拆分为多列 - Break out DataFrame column into multiple columns using apply
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM