简体   繁体   English

熊猫在第k个参数中的“ .apply()”

[英]“.apply()” in Pandas to k-th argument

I want to apply my own multi-argument function to a Pandas data-frame (or a series within) using the data frame entries as the k-th argument in my N-argument function. 我想将自己的多参数函数应用到Pandas数据帧(或其中的一系列数据),并使用数据帧条目作为我的N参数函数中的第k个参数。

It only seems to work if I pass the dataframe as the first argument. 仅当我将数据框作为第一个参数传递时,它才似乎有效。 I want to be able to pass the dataframe through one of the other arguments. 我希望能够通过其他参数之一传递数据框。


# A simple 3 argument function:
def my_func(a, b, c):
    return (a/b)**c

# Data-Frame:
d1 = {
    'column1': [1.1, 1.2, 1.3, ],
    'column2': [2.1, 2.2, 2.3, ]
}
df = pd.DataFrame(d1, index = [1, 2, 3])


# I can apply it fine if I pass the columns as the first argument i.e. "a":
df.apply(my_func, b=7, c=9)

# However, I am unable to pass the columns through arguments "b" or "c":
df.apply(my_func, a = 7, c = 9)

This returns a TypeError: ("my_func() got multiple values for argument 'a'", 'occurred at index column1') 这将返回TypeError :( “ my_func()对于参数'a'具有多个值”,“在索引column1处发生”)

I want to be able to pass the columns of the data frame (or series) through any of the arguments of my own multi-argument function. 我希望能够通过我自己的多参数函数的任何参数传递数据框(或系列)的列。 Is there a simple/intuitive (non-hack-like) way of doing this? 是否有一种简单/直观(非黑客式)的方法?

If I understand you correctly all you need is: 如果我对您的理解正确,那么您需要做的是:

my_func(df['column1'], b=7, c=5)

Pandas series can be multiplied/divided/taken to power of a constant, returning a series of the same size. 熊猫系列可以乘以/除以/以获得常数,从而返回相同大小的系列。

In a more sophisticated scenario, when scalar operations aren't enough, it could also be written as something like: 在更复杂的情况下,当标量运算不够用时,也可以将其写为:

df.apply(lambda row: my_func(row['column1'], b=7, c=5), axis=1)

Here axis=1 tells Pandas to apply this function to row instead of column(default) 这里axis=1告诉Pandas将此功能应用于行而不是列(默认)

To apply this function elementwise, you can use df.appymap instead: 要逐个应用此功能,可以改用df.appymap

df.applymap(lambda value: my_func(7, value, c=5)

However, it will be much faster if my_func or its input could be adjusted to use vectors instead: 然而,这将是更快 ,如果my_func或其输入可以调整使用的载体,而不是:

my_func(np.ones(df.shape) * 7, df, c=5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM