简体   繁体   English

Pandas - 将函数应用于具有来自不同列的多个参数的数据帧

[英]Pandas - Apply a function to a dataframe with several arguments from different columns

I would like to use apply() function to a dataframe to generate a date range thanks to pandas date_range() function.由于 pandas date_range() 函数,我想对数据帧使用 apply() 函数以生成日期范围。

The following code works, and does what I expect it to do.以下代码有效,并执行我期望的操作。

import pandas as pd

def my_date_range(start, end, freq):
    return pd.date_range(start = start, end = end, freq = freq)

df = pd.DataFrame({'Start':[pd.Timestamp('1970-01-02 00:00:00')], 'End':[pd.Timestamp('1970-01-02 00:30:00')], 'Freq':[pd.Timedelta(5,'m')]})

df1 = df.apply(lambda x: my_date_range(x.Start, x.End, x.Freq), axis=1)

The result:结果:

In [28]: df
Out[28]: 
       Start                 End     Freq
0 1970-01-02 1970-01-02 00:30:00 00:05:00

In[29] : df1[0]
Out[29]: 
DatetimeIndex(['1970-01-02 00:00:00', '1970-01-02 00:05:00',
               '1970-01-02 00:10:00', '1970-01-02 00:15:00',
               '1970-01-02 00:20:00', '1970-01-02 00:25:00',
               '1970-01-02 00:30:00'],
              dtype='datetime64[ns]', freq='5T')

So now my problem / my questions.所以现在我的问题/我的问题。 I could read that it is possible to use apply() without lambda in this way, as I understand:我可以读到可以以这种方式在没有 lambda 的情况下使用 apply() ,据我所知:

df2 = df[['Start', 'End', 'Freq']].apply(my_date_range, axis=1)

But above code produces following error.但是上面的代码会产生以下错误。

TypeError: ("my_date_range() missing 2 required positional arguments: 'end' and 'freq'", 'occurred at index 0')

Please, what am I doing wrong?请问,我做错了什么?

And is it interesting to avoid the use of lambda?避免使用 lambda 是否有趣? (better performances?) (更好的表现?)

Finally, is there also a way to use directly pd.date_range?最后还有没有直接使用pd.date_range的方法?

If I try with below code, I get the following error:如果我尝试使用以下代码,则会出现以下错误:

df1 = df.apply(lambda x: pd.date_range(x.Start, x.End, x.Freq), axis=1)

"periods must be a number, got {periods}".format(periods=periods)

TypeError: ('periods must be a number, got 0 days 00:05:00', 'occurred at index 0')

Thanks in advance for your help!在此先感谢您的帮助! Have a good day!祝你有美好的一天!

1 1

As you can see in the error message, if you want to use the function name to pandas.DataFrame.apply , the function should take a pandas.Series as a argument.正如您在错误消息中看到的那样,如果您想将函数名称用于pandas.DataFrame.apply ,则该函数应采用pandas.Series作为参数。 So it should be like this.所以应该是这样的。

def my_date_range(x):
    return pd.date_range(start = x.Start, end = x.End, freq = x.Freq)
df2 = df.apply(my_date_range, axis=1)

2 2

Well personally I think lambda makes things a lot more convenient.我个人认为 lambda 使事情变得更加方便。 In your case, the original way you used of defining a function and then using another lambda is not convenient at all, since the point of lambda is not having to use def .在您的情况下,您定义函数然后使用另一个 lambda 的原始方式根本不方便,因为lambda是不必使用def However you can use lambda and make it more convenient as you tried in the last part of the question.但是,您可以使用 lambda 并使其更方便,因为您在问题的最后一部分中尝试过。

3 3

The reason of the error is because the function pd.date_range arguments goes like this.错误的原因是因为函数pd.date_range参数是这样的。 pandas.date_range(start=None, end=None, periods=None, ...) So if you just give it as a positional argument as you did, it thinks the third argument is period= . pandas.date_range(start=None, end=None, periods=None, ...)所以如果你像你一样把它作为一个位置参数,它认为第三个参数是period= You should give it as a keyword argument (as you did in above).你应该把它作为关键字参数(就像你在上面所做的那样)。

df1 = df.apply(lambda x: pd.date_range(start = x.Start, end = x.End, freq = x.Freq), axis=1)

What about something like this:这样的事情怎么样:

import pandas as pd
start = pd.Timestamp('1970-01-02 00:00:00')
end = pd.Timestamp('1970-01-02 00:30:00')
pd.date_range(start, end, freq='5Min')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何应用具有多个 dataframe 列的 function 作为 arguments? - How to apply a function with several dataframe columns as arguments? 如何应用具有不同输入参数的相同函数在熊猫数据框中创建新列? - How to apply the same function with different input arguments to create new columns in pandas dataframe? 在 dataframe 列上应用 function 以获得其他几个列 Pandas ZA7F5F35426B5274113ZB231 - Apply function on dataframe Column to get several other columns Pandas Python 在df.apply()中的自定义函数中传递Pandas DataFrame中的不同列 - Pass Different Columns in Pandas DataFrame in a Custom Function in df.apply() 将多个函数应用于 Pandas DataFrame 返回几列的有效方法 - Efficient way to apply several functions to Pandas DataFrame returning several columns 将函数应用于熊猫中的不同列 - Apply function to different columns in pandas 将具有多个参数的函数应用于滚动 DataFrame Pandas - Apply function with multiple arguments to rolling DataFrame Pandas 在具有多个参数的 Pandas 数据帧上应用滚动函数 - Apply rolling function on pandas dataframe with multiple arguments 将具有多个参数的函数应用于Pandas中的整个数据框 - Apply a function with multiple arguments on an entire dataframe in Pandas Pandas DataFrame应用函数,多个参数 - Pandas DataFrame Apply function, multiple arguments
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM