简体   繁体   English

如何将UDF应用于数据框?

[英]How to apply UDF to dataframe?

I am trying to create a function that will cleanup and dataframe that I put through the function. 我正在尝试创建一个函数,该函数将清理和放置通过该函数的数据框。 But I noticed that the df returned is cleanued up but not in place of the original df. 但是我注意到返回的df已清理,但不能代替原始df。

How can I run a UDF on a dataframe and keep the updated dataframe saved in place? 如何在数据框上运行UDF并将更新的数据框保存在适当的位置?

ps I know I can combine these rules into one line but the function I am creating is a lot more complex so I don't want to combine for this example ps我知道我可以将这些规则合并为一行,但是我创建的函数要复杂得多,所以我不想在此示例中合并

df = pd.DataFrame({'Key': ['3', '9', '9', '9', '9','34','34', '34'], 
              'LastFour': ['2290', '0087', 'M433','M433','25','25','25','25'],
              'NUM': [20120528, 20120507, 20120615,20120629,20120621,20120305,20120506,20120506]})

def cleaner(x):
    x = x[x['Key'] == '9']
    x = x[x['LastFour'] == 'M433']
    x = x[x['NUM'] == 20120615]
    return x

cleaner(df)

Result from the UDF: UDF的结果:

    Key LastFour    NUM
2   9   M433        20120615

But if I run the df after the function then I still get the original dataset: 但是,如果我在函数之后运行df,那么我仍然可以获得原始数据集:

    Key LastFour   NUM
0   3   2290       20120528
1   9   0087       20120507
2   9   M433       20120615
3   9   M433       20120629
4   9   25         20120621
5   34  25         20120305
6   34  25         20120506
7   34  25         20120506

You need to assign the result of cleaner(df) back to df as so: 您需要按以下方式将cleaner(df)的结果分配回df

df = cleaner(df)

An alternative method is to use pd.DataFrame.pipe to pass your dataframe through a function: 另一种方法是使用pd.DataFrame.pipe通过函数传递数据pd.DataFrame.pipe

df = df.pipe(cleaner)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM