python pandas：将数据帧传递给df.apply

Question

Long time user of this site but first time asking a question! 这个网站的长期用户，但第一次问一个问题！ Thanks to all of the benevolent users who have been answering questions for ages :) 感谢所有多年来一直回答问题的仁慈用户:)

I have been using df.apply lately and ideally want to pass a dataframe into the args parameter to look something like so: df.apply(testFunc, args=(dfOther), axis = 1) 我最近一直在使用df.apply ，理想情况下想要将数据帧传递给args参数，看起来像这样： df.apply(testFunc, args=(dfOther), axis = 1)

My ultimate goal is to iterate over the dataframe I am passing in the args parameter and check logic against each row of the original dataframe, say df , and return some value from dfOther . 我的最终目标是遍历我在args参数中传递的数据帧，并检查原始数据帧的每一行的逻辑，比如df ，并从dfOther返回一些值。 So say I have a function like this: 所以说我有这样的功能：

def testFunc(row, dfOther):
    for index, rowOther in dfOther.iterrows():
        if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
            return dfOther.at[index, 'C']

df['OTHER'] = df.apply(testFunc, args=(dfOther), axis = 1)

My current understanding is that args expects a Series object, and so if I actually run this we get the following error: 我目前的理解是args需要一个Series对象，所以如果我实际运行它，我们会得到以下错误：

ValueError: The truth value of a DataFrame is ambiguous. 
Use a.empty, a.bool(), a.item(), a.any() or a.all().

However before I wrote testFunc which only passes in a single dataframe, I had actually written priorTestFunc , which looks like this... And it works! 然而，在我编写仅在单个数据帧中传递的testFunc之前，我实际上已经写了priorTestFunc ，它看起来像这样......而且它有效！

def priorTestFunc(row, dfOne, dfTwo):
    for index, rowOne in dfOne.iterrows():
        if row['A'] == rowOne[0] and row['B'] == rowOne[1]:
            return dfTwo.at[index, 'C']

df['OTHER'] = df.apply(testFunc, args=(dfOne, dfTwo), axis = 1)

So to my dismay I have been coming into the habit of writing testFunc like so and it has been working as intended: 所以令我沮丧的是，我一直习惯于像这样编写testFunc并且它一直按预期工作：

def testFunc(row, dfOther, _):
    for index, rowOther in dfOther.iterrows():
        if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
            return dfOther.at[index, 'C']

df['OTHER'] = df.apply(testFunc, args=(dfOther, _), axis = 1)

I would really appreciate if someone could let me know why this would be the case and maybe errors that I will be prone to, or maybe another alternative for solving this kind of problem!! 我真的很感激，如果有人能让我知道为什么会出现这种情况，也许我会倾向于错误，或者可能是解决这类问题的另一种选择！

EDIT: As requested by the comment: My dfs generally look like the below.. They will have two matching columns and will be returning a value from the dfOther.at[index, column] I have considered pd.concat([dfOther, df]) however I will be running an algorithm testing conditions on df and then updating it accordingly from specific values on dfOther (which will also be updating) and I would like df to be relatively neat, as opposed to making a multindex and throwing just about everything in it. 编辑：根据评论的要求：我的dfs通常如下所示..他们将有两个匹配的列，将从dfOther.at[index, column]返回一个值我认为是pd.concat([dfOther, df])但是我将在df上运行一个算法测试条件，然后根据dfOther上的特定值（也将更新）相应地更新它，我希望df相对整齐，而不是制作一个多索引并抛出一切都在里面。 Also I am aware df.iterrows is in general slow, but these dataframes will be about 500 rows at the max, so scalability isn't really a massive concern for me at the moment. 另外我知道df.iterrows通常很慢，但是这些数据帧最多只有500行，所以目前可扩展性对我来说并不是一个大问题。

df
Out[10]: 
    A    B      C
0  foo  bur   6000
1  foo  bur   7000
2  foo  bur   8000
3  bar  kek   9000
4  bar  kek  10000
5  bar  kek  11000

dfOther
Out[12]: 
    A    B      C
0  foo  bur   1000
1  foo  bur   2000
2  foo  bur   3000
3  bar  kek   4000
4  bar  kek   5000
5  bar  kek   6000

Answer 1

The error is in this line: 错误在这一行：

  File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\frame.py", line 4017, in apply
    if kwds or args and not isinstance(func, np.ufunc):

Here, if kwds or args is checking whether the length of args passed to apply is greater than 0. It is a common way to check if an iterable is empty: 这里， if kwds or args正在检查传递给apply的args的长度是否大于0.这是检查iterable是否为空的常用方法：

l = []

if l:
    print("l is not empty!")
else:
    print("l is empty!")

l is empty!

l = [1]

if l:
    print("l is not empty!")
else:
    print("l is empty!")

l is not empty!

If you had passed a tuple to df.apply as args , it would return True and there wouldn't be a problem. 如果你已经将一个元组作为args传递给df.apply ，那么它将返回True并且不存在问题。 However, Python does not interpret (df) as a tuple: 但是，Python不会将（df）解释为元组：

type((df))
Out[39]: pandas.core.frame.DataFrame

It is just a DataFrame/variable inside parentheses. 它只是括号内的DataFrame /变量。 When you type if df : 键入if df ：

if df:
    print("df is not empty")

Traceback (most recent call last):

  File "<ipython-input-40-c86da5a5f1ee>", line 1, in <module>
    if df:

  File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\generic.py", line 887, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You get the same error message. 您收到相同的错误消息。 However, if you use a comma to indicate that it'a tuple, it works fine: 但是，如果你使用逗号来表示它是一个元组，它可以正常工作：

if (df, ):
    print("tuple is not empty")

tuple is not empty

As a result, adding a comma to args=(dfOther) by making it a singleton should solve the problem. 因此，通过使逗号成为单例来为args=(dfOther)添加逗号可以解决问题。

df['OTHER'] = df.apply(testFunc, args=(dfOther, ), axis = 1)

python pandas：将数据帧传递给df.apply

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-06-04 13:26:47

python pandas：将数据帧传递给df.apply

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-06-04 13:26:47

解决方案1
3 已采纳 2016-06-04 13:26:47