简体   繁体   English

python pandas:将 function 与 arguments 应用到系列

[英]python pandas: apply a function with arguments to a series

I want to apply a function with arguments to a series in python pandas:我想将带有 arguments 的 function 应用于 python Z3A43B4F88325D94022C0EFA9C2FA2 中的系列

x = my_series.apply(my_function, more_arguments_1)
y = my_series.apply(my_function, more_arguments_2)
...

The documentation describes support for an apply method, but it doesn't accept any arguments.文档描述了对应用方法的支持,但它不接受任何 arguments。 Is there a different method that accepts arguments?是否有其他方法可以接受 arguments? Alternatively, am I missing a simple workaround?或者,我是否缺少一个简单的解决方法?

Update (October 2017): Note that since this question was originally asked that pandas apply() has been updated to handle positional and keyword arguments and the documentation link above now reflects that and shows how to include either type of argument.更新(2017 年 10 月):请注意,由于最初提出此问题,因此 pandas apply()已更新以处理位置和关键字 arguments 并且上面的文档链接现在反映了这一点,并显示了如何包含任一类型的参数。

Newer versions of pandas do allow you to pass extra arguments (see the new documentation ).较新版本的 Pandas确实允许您传递额外的参数(请参阅新文档)。 So now you can do:所以现在你可以这样做:

my_series.apply(your_function, args=(2,3,4), extra_kw=1)

The positional arguments are added after the element of the series.位置参数添加系列元素之后


For older version of pandas:对于旧版本的熊猫:

The documentation explains this clearly.文档清楚地解释了这一点。 The apply method accepts a python function which should have a single parameter. apply 方法接受一个应该有一个参数的 python 函数。 If you want to pass more parameters you should use functools.partial as suggested by Joel Cornett in his comment.如果您想传递更多参数,您应该按照 Joel Cornett 在他的评论中的建议使用functools.partial

An example:一个例子:

>>> import functools
>>> import operator
>>> add_3 = functools.partial(operator.add,3)
>>> add_3(2)
5
>>> add_3(7)
10

You can also pass keyword arguments using partial .您还可以使用partial传递关键字参数。

Another way would be to create a lambda:另一种方法是创建一个 lambda:

my_series.apply((lambda x: your_func(a,b,c,d,...,x)))

But I think using partial is better.但我认为使用partial更好。

Steps:脚步:

  1. Create a dataframe创建数据框
  2. Create a function创建函数
  3. Use the named arguments of the function in the apply statement.在 apply 语句中使用函数的命名参数。

Example例子

x=pd.DataFrame([1,2,3,4])  

def add(i1, i2):  
    return i1+i2

x.apply(add,i2=9)

The outcome of this example is that each number in the dataframe will be added to the number 9.此示例的结果是数据框中的每个数字都将添加到数字 9。

    0
0  10
1  11
2  12
3  13

Explanation:解释:

The "add" function has two parameters: i1, i2. “add”函数有两个参数:i1、i2。 The first parameter is going to be the value in data frame and the second is whatever we pass to the "apply" function.第一个参数将是数据框中的值,第二个参数是我们传递给“apply”函数的任何内容。 In this case, we are passing "9" to the apply function using the keyword argument "i2".在这种情况下,我们使用关键字参数“i2”将“9”传递给应用函数。

Series.apply(func, convert_dtype=True, args=(), **kwds)

args : tuple

x = my_series.apply(my_function, args = (arg1,))

You can pass any number of arguments to the function that apply is calling through either unnamed arguments, passed as a tuple to the args parameter, or through other keyword arguments internally captured as a dictionary by the kwds parameter.您可以通过未命名参数、作为元组传递给args参数或通过kwds参数内部捕获为字典的其他关键字参数,将任意数量的参数传递给apply正在调用的函数。

For instance, let's build a function that returns True for values between 3 and 6, and False otherwise.例如,让我们构建一个函数,该函数对于 3 到 6 之间的值返回 True,否则返回 False。

s = pd.Series(np.random.randint(0,10, 10))
s

0    5
1    3
2    1
3    1
4    6
5    0
6    3
7    4
8    9
9    6
dtype: int64

s.apply(lambda x: x >= 3 and x <= 6)

0     True
1     True
2    False
3    False
4     True
5    False
6     True
7     True
8    False
9     True
dtype: bool

This anonymous function isn't very flexible.这个匿名函数不是很灵活。 Let's create a normal function with two arguments to control the min and max values we want in our Series.让我们创建一个带有两个参数的普通函数来控制我们想要的系列中的最小值和最大值。

def between(x, low, high):
    return x >= low and x =< high

We can replicate the output of the first function by passing unnamed arguments to args :我们可以通过将未命名的参数传递给args来复制第一个函数的输出:

s.apply(between, args=(3,6))

Or we can use the named arguments或者我们可以使用命名参数

s.apply(between, low=3, high=6)

Or even a combination of both或者甚至是两者的结合

s.apply(between, args=(3,), high=6)
#sample dataframe

import pandas as pd

df1=pd.DataFrame({'a':[3,4,7],'b':[4,2,2]})

#my function

def add_some(p,q,r):return p+q+r

df2=df1[["a","b"]].apply(add_some, args=(3,2))

print(df2)

_ ab _ ab

0 8 9 0 8 9

1 9 7 1 9 7

2 12 7 2 12 7

大多数内容都包含在其他答案中,想重复一下您可能错过的内容,您需要在 args 元组中的参数后添加一个逗号,请参见以下示例:

df['some_column'].apply(function_name, args=(arg1 ,) #Here comma is necessary.

You just need to add comma after arguments, then you will be able to run function on whole list.您只需要在 arguments 之后添加逗号,然后您就可以在整个列表中运行 function。 Example is given below.下面给出示例。 Same procedure can be done on set.同样的程序可以在现场完成。

df = {"name" : [2,3,4,6],
      
      "age" : [4,10, 30, 20]
      }

print("Before")
df = pd.DataFrame(df)

print(df)

def fun(a, b):
    for c in b:
        a +=c
    return a
[![enter image description here][1]][1]

listt = set([3,4,5])

print("After")
new = df.apply(fun, args = (listt,))
print(new)

结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM