简体   繁体   English

Python:在pandas lambda表达式中使用函数

[英]Python: use a function in pandas lambda expression

I have the following code, trying to find the hour of the 'Dates' column in a data frame: 我有以下代码,试图找到数据框中“日期”列的小时:

print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)

def find_hour(self, input):
    return input[11:13].astype(float)

where the print(df['Dates'].head(3)) looks like: print(df['Dates'].head(3))如下所示:

0    2015-05-13 23:53:00
1    2015-05-13 23:53:00
2    2015-05-13 23:33:00

However, I got the following error: 但是,我收到以下错误:

    df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
NameError: ("global name 'find_hour' is not defined", u'occurred at index 0')

Does anyone know what I missed? 有谁知道我错过了什么? Thanks! 谢谢!


Note that if I put the function directly in the lambda line like below, everything works fine: 请注意,如果我将函数直接放在lambda行中,如下所示,一切正常:

df['hour'] = df.apply(lambda x: x['Dates'][11:13], axis=1).astype(float)

You are trying to use find_hour before it has yet been defined. 您在尝试使用find_hour之前尝试使用它。 You just need to switch things around: 你只需要切换一下:

def find_hour(self, input):
    return input[11:13].astype(float)

print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)

Edit : Padraic has pointed out a very important point: find_hour() is defined as taking two arguments, self and input , but you are giving it only one. 编辑 :Padraic指出了一个非常重要的观点: find_hour()被定义为接受两个参数, selfinput ,但是你只给它一个。 You should define find_hour() as def find_hour(input): except that defining the argument as input shadows the built-in function. 您应该将find_hour()定义为def find_hour(input):除了将参数定义为input影响内置函数。 You might consider renaming it to something a little more descriptive. 您可以考虑将其重命名为更具描述性的内容。

what is wrong with old good .dt.hour ? 旧的好.dt.hour什么问题?

In [202]: df
Out[202]:
                 Date
0 2015-05-13 23:53:00
1 2015-05-13 23:53:00
2 2015-05-13 23:33:00

In [217]: df['hour'] = df.Date.dt.hour

In [218]: df
Out[218]:
                 Date  hour
0 2015-05-13 23:53:00    23
1 2015-05-13 23:53:00    23
2 2015-05-13 23:33:00    23

and if your Date column is of string type you may want to convert it to datetime first : 如果你的Date列是字符串类型的,你可能需要先将其转换成datetime:

df.Date = pd.to_datetime(df.Date)

or just: 要不就:

df['hour'] = int(df.Date.str[11:13])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM