python-将dataframe列作为apply函数中的参数传递

Question

I have the following dataframe: 我有以下数据框：

In[1]: df = DataFrame({"A": ['I love cooking','I love rowing'], "B": [['cooking','rowing'],['cooking','rowing']]})

Thus the output that I get is: 因此，我得到的输出是：

In[2]: df
Out[1]: 
            A                  B
0  I love cooking  [cooking, rowing]
1   I love rowing  [cooking, rowing]

I want to create a 'C' column where I count the number of occurrences of elements of 'B' in 'A'. 我想创建一个“ C”列，在其中计算“ A”中“ B”元素的出现次数。

The function I create is: 我创建的功能是：

def count_keywords(x,y):
    a = 0
    for element in y:
        if element in x:
            a += 1
return a

and then do: 然后执行：

df['A'].apply(count_keywords,args=(df['B'],))

In this case, I am passing the entire pandas dataseries as argument, so the element of the dataseries df['B'] is obviously a list, not a string (which in turn is the element of the list). 在这种情况下，我将整个pandas数据系列作为参数传递，因此数据系列df ['B']的元素显然是一个列表，而不是字符串（反过来又是列表的元素）。

So I get: 所以我得到：

TypeError: 'in <string>' requires string as left operand, not list

However, if I adjust the function so that: 但是，如果我将功能调整为：

def count_keywords(x,y): 
    a = 0
    for element in y:
        for new_element in element:
            if new_element in x:
                a += 1
    return a

and then do: 然后执行：

In[3]: df['A'].apply(count_keywords,args=(df['B'],))

the output is: 输出为：

Out[2]: 
0    2
1    2

Because the function loops through every element in the pandas series and then through every element in the list. 因为该函数循环遍历pandas系列中的每个元素，然后遍历列表中的每个元素。

How can I get the function to just check, per dataframe row, the element of series df['B'] against the element in series df['A'], so the output is:? 如何获得仅针对每个数据帧行，将系列df ['B']的元素与系列df ['A']的元素进行比较的函数，所以输出为：？

Out[2]: 
0    1
1    1

Thanks a lot! 非常感谢！

Answer 1

You have to apply over the other axis. 你必须apply于其他轴。

def count_keywords(row): 
    counter = 0
    for e in row['B']:
        if e in row['A']:
            counter += 1
    row['C'] = counter
    return row

df2 = df.apply(count_keywords,axis=1)

Gives you: 给你：

           A                B           C
0   I love cooking  [cooking, rowing]   1
1   I love rowing   [cooking, rowing]   1

Then df2['C'] should give you the 1,1 series you mention. 然后df2 ['C']应该给您您提到的1,1系列。

Answer 2

Another way you could do this is by using a set intersection to calculate the size. 您可以执行此操作的另一种方法是使用设置的相交来计算大小。 In theory this may be faster then iterating over the elements, since set is sort of designed for this kind of thing: 从理论上讲，这比遍历元素更快，因为set是为这种事情而设计的：

df['C'] = df.apply(lambda x: len(set(x.B).intersection(set(x.A.split()))), axis = 1)

python-将dataframe列作为apply函数中的参数传递

问题描述

2 个解决方案

解决方案1
2 2015-11-02 00:10:44

解决方案2
2 已采纳 2015-11-02 00:13:02

python-将dataframe列作为apply函数中的参数传递

问题描述

2 个解决方案

解决方案1 2 2015-11-02 00:10:44

解决方案2 2 已采纳 2015-11-02 00:13:02

解决方案1
2 2015-11-02 00:10:44

解决方案2
2 已采纳 2015-11-02 00:13:02