Pandas DataFrame应用函数，多个参数

Question

I have a Pandas dataframe and one of the columns is a string. 我有一个Pandas数据帧，其中一列是一个字符串。 I imported a function from an external module to do some RegEx checking and reduce this string to a short classification. 我从外部模块导入一个函数来执行一些RegEx检查，并将此字符串缩减为一个简短的分类。

This works: 这有效：

df['PageCLass'] = df['PageClass'].apply(lambda x: PageClassify.page_classify(x))

However what I would really like to do is incorporate another column 'Rev' in the dataframe which happens to be either be a float or NaN into the checking. 然而，我真正想做的是在数据框中加入另一列'Rev'，它恰好是一个浮点数或NaN进入检查。

When I did this: 当我这样做时：

df['PageCLass'] = df['PageClass'].apply(lambda x: PageClassify.page_classify(x,df['Rev']))

and I was doing logical checks inside the classification function on the 2nd argument, I got this error: 我在第二个参数的分类函数中进行逻辑检查，我得到了这个错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What I am looking for is a way to capture the 2nd argument value by value, just as lambda x: captures the first argument value by value. 我正在寻找的是一种通过值捕获第二个参数值的方法，就像lambda x：按值捕获第一个参数值。

Answer 1

The method above is ok I guess if it worked... In my opinion it does not answer the question because you're concatenating two arguments into one. 上面的方法是可以的，我猜它是否有效......在我看来，它没有回答这个问题，因为你将两个参数连接成一个。

A way to do this to allow you to pass two arguments to apply: 一种方法，允许您传递两个参数来应用：

df['PageCLass'] = df[['PageClass','Rev']].apply(lambda x: PageClassify.page_classify(*x), axis=1)

I don't know what the page_classify method looks like but if it takes two arguments the above should work. 我不知道page_classify方法是什么样的，但如果它需要两个参数，则上述方法应该有效。 Does this work for you? 这对你有用吗？

Answer 2

Assuming you want to just do this row by row, the following should work: 假设您想要逐行执行此操作，以下操作应该有效：

df['PageCLass'] = (df['PageClass'] + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))

Here, you are simply concatenating the two dataframe columns together and then you can apply the function to each row in the new column. 在这里，您只需将两个数据帧列连接在一起，然后就可以将该函数应用于新列中的每一行。 If you need to check the values of PageClass and Rev as separate arguments, you could also add a delimiter (eg '\\t') to the concatenation and then simply split on that inside the function: 如果需要将PageClass和Rev的值检查为单独的参数，还可以在连接中添加分隔符（例如'\\ t'），然后在函数内部进行简单分割：

df['PageCLass'] = (df['PageClass'] + '\t' + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))

Hope this helps! 希望这可以帮助！

Pandas DataFrame应用函数，多个参数

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-03-28 20:30:51

解决方案2
1 2017-03-28 19:05:27

Pandas DataFrame应用函数，多个参数

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-03-28 20:30:51

解决方案2 1 2017-03-28 19:05:27

解决方案1
2 已采纳 2017-03-28 20:30:51

解决方案2
1 2017-03-28 19:05:27