[英]python pandas: passing in dataframe to df.apply
Long time user of this site but first time asking a question! 这个网站的长期用户,但第一次问一个问题! Thanks to all of the benevolent users who have been answering questions for ages :)
感谢所有多年来一直回答问题的仁慈用户:)
I have been using df.apply
lately and ideally want to pass a dataframe into the args
parameter to look something like so: df.apply(testFunc, args=(dfOther), axis = 1)
我最近一直在使用
df.apply
,理想情况下想要将数据帧传递给args
参数,看起来像这样: df.apply(testFunc, args=(dfOther), axis = 1)
My ultimate goal is to iterate over the dataframe I am passing in the args
parameter and check logic against each row of the original dataframe, say df
, and return some value from dfOther
. 我的最终目标是遍历我在
args
参数中传递的数据帧,并检查原始数据帧的每一行的逻辑,比如df
,并从dfOther
返回一些值。 So say I have a function like this: 所以说我有这样的功能:
def testFunc(row, dfOther):
for index, rowOther in dfOther.iterrows():
if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
return dfOther.at[index, 'C']
df['OTHER'] = df.apply(testFunc, args=(dfOther), axis = 1)
My current understanding is that args
expects a Series object, and so if I actually run this we get the following error: 我目前的理解是
args
需要一个Series对象,所以如果我实际运行它,我们会得到以下错误:
ValueError: The truth value of a DataFrame is ambiguous.
Use a.empty, a.bool(), a.item(), a.any() or a.all().
However before I wrote testFunc
which only passes in a single dataframe, I had actually written priorTestFunc
, which looks like this... And it works! 然而,在我编写仅在单个数据帧中传递的
testFunc
之前,我实际上已经写了priorTestFunc
,它看起来像这样......而且它有效!
def priorTestFunc(row, dfOne, dfTwo):
for index, rowOne in dfOne.iterrows():
if row['A'] == rowOne[0] and row['B'] == rowOne[1]:
return dfTwo.at[index, 'C']
df['OTHER'] = df.apply(testFunc, args=(dfOne, dfTwo), axis = 1)
So to my dismay I have been coming into the habit of writing testFunc
like so and it has been working as intended: 所以令我沮丧的是,我一直习惯于像这样编写
testFunc
并且它一直按预期工作:
def testFunc(row, dfOther, _):
for index, rowOther in dfOther.iterrows():
if row['A'] == rowOther[0] and row['B'] == rowOther[1]:
return dfOther.at[index, 'C']
df['OTHER'] = df.apply(testFunc, args=(dfOther, _), axis = 1)
I would really appreciate if someone could let me know why this would be the case and maybe errors that I will be prone to, or maybe another alternative for solving this kind of problem!! 我真的很感激,如果有人能让我知道为什么会出现这种情况,也许我会倾向于错误,或者可能是解决这类问题的另一种选择!
EDIT: As requested by the comment: My dfs generally look like the below.. They will have two matching columns and will be returning a value from the dfOther.at[index, column]
I have considered pd.concat([dfOther, df])
however I will be running an algorithm testing conditions on df
and then updating it accordingly from specific values on dfOther
(which will also be updating) and I would like df
to be relatively neat, as opposed to making a multindex and throwing just about everything in it. 编辑:根据评论的要求:我的dfs通常如下所示..他们将有两个匹配的列,将从
dfOther.at[index, column]
返回一个值我认为是pd.concat([dfOther, df])
但是我将在df
上运行一个算法测试条件,然后根据dfOther
上的特定值(也将更新)相应地更新它,我希望df
相对整齐,而不是制作一个多索引并抛出一切都在里面。 Also I am aware df.iterrows
is in general slow, but these dataframes will be about 500 rows at the max, so scalability isn't really a massive concern for me at the moment. 另外我知道
df.iterrows
通常很慢,但是这些数据帧最多只有500行,所以目前可扩展性对我来说并不是一个大问题。
df
Out[10]:
A B C
0 foo bur 6000
1 foo bur 7000
2 foo bur 8000
3 bar kek 9000
4 bar kek 10000
5 bar kek 11000
dfOther
Out[12]:
A B C
0 foo bur 1000
1 foo bur 2000
2 foo bur 3000
3 bar kek 4000
4 bar kek 5000
5 bar kek 6000
The error is in this line: 错误在这一行:
File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\frame.py", line 4017, in apply
if kwds or args and not isinstance(func, np.ufunc):
Here, if kwds or args
is checking whether the length of args
passed to apply
is greater than 0. It is a common way to check if an iterable is empty: 这里,
if kwds or args
正在检查传递给apply
的args
的长度是否大于0.这是检查iterable是否为空的常用方法:
l = []
if l:
print("l is not empty!")
else:
print("l is empty!")
l is empty!
l = [1]
if l:
print("l is not empty!")
else:
print("l is empty!")
l is not empty!
If you had passed a tuple to df.apply
as args
, it would return True and there wouldn't be a problem. 如果你已经将一个元组作为
args
传递给df.apply
,那么它将返回True并且不存在问题。 However, Python does not interpret (df) as a tuple: 但是,Python不会将(df)解释为元组:
type((df))
Out[39]: pandas.core.frame.DataFrame
It is just a DataFrame/variable inside parentheses. 它只是括号内的DataFrame /变量。 When you type
if df
: 键入
if df
:
if df:
print("df is not empty")
Traceback (most recent call last):
File "<ipython-input-40-c86da5a5f1ee>", line 1, in <module>
if df:
File "C:\Anaconda3\envs\p2\lib\site-packages\pandas\core\generic.py", line 887, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You get the same error message. 您收到相同的错误消息。 However, if you use a comma to indicate that it'a tuple, it works fine:
但是,如果你使用逗号来表示它是一个元组,它可以正常工作:
if (df, ):
print("tuple is not empty")
tuple is not empty
As a result, adding a comma to args=(dfOther)
by making it a singleton should solve the problem. 因此,通过使逗号成为单例来为
args=(dfOther)
添加逗号可以解决问题。
df['OTHER'] = df.apply(testFunc, args=(dfOther, ), axis = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.