将函数应用于数据框中的每一行

Question

I am trying to apply the following function for each row in a dataframe.我正在尝试为数据框中的每一行应用以下函数。 The dataframe looks as follows:数据框如下所示：

vote_1 vote_2 vote_3 vote_4
a      a       a      b           
b      b       a      b          
b      a       a      b

I am tring to generate a fourth column to sum the 'votes' of the other columns and produce the winner, as follows:我试图生成第四列来总结其他列的“投票”并产生获胜者，如下所示：

vote_1 vote_2 vote_3 vote_4 winner_columns
a      a       a      b           a
b      b       a      b           b 
b      a       a      b           draw

I have currently tried:我目前尝试过：

def winner(x):
    a = new_df.iloc[x].value_counts()['a']
    b = new_df.iloc[x].value_counts()['b']
    if a > b:
        y = 'a'
    elif a < b:
        y = 'b'
    else:
        y = 'draw'
    return y

df['winner_columns'].apply(winner)

However the whole column gets filled with draws.然而，整列都充满了平局。 I assume is something with the way I have build the function but can't figure out what我认为这与我构建函数的方式有关，但无法弄清楚是什么

Answer 1

You can use DataFrame.mode and count non missing values by DataFrame.count , if only one use first column else draw in numpy.where :您可以使用DataFrame.mode并通过DataFrame.count计算非缺失值，如果只有一个使用第一列，否则在numpy.where draw ：

df1 = df.mode(axis=1)
print (df1)
   0    1
0  a  NaN
1  b  NaN
2  a    b

df['winner_columns'] = np.where(df1.count(axis=1).eq(1), df1[0], 'draw')
print (df)
  vote_1 vote_2 vote_3 vote_4 winner_columns
0      a      a      a      b              a
1      b      b      a      b              b
2      b      a      a      b           draw

Your solution is possible change:您的解决方案可能会发生变化：

def winner(x):
    s = x.value_counts()
    a = s['a']
    b = s['b']
    if a > b:
        y = 'a'
    elif a < b:
        y = 'b'
    else:
        y = 'draw'
    return y

df['winner_columns'] = df.apply(winner,axis=1)
print (df)
  vote_1 vote_2 vote_3 vote_4 winner_columns
0      a      a      a      b              a
1      b      b      a      b              b
2      b      a      a      b           draw

Answer 2

The first problem is that your DataFrame contains sometimes a letter followed by a dot.第一个问题是您的 DataFrame 有时包含一个字母后跟一个点。

So to look for solely 'a' or 'b' you have to replace these dots with an empty string, something like:因此，要仅查找'a'或'b'您必须用空字符串替换这些点，例如：

df.replace('\.', '', regex=True)

Another problem, which didin't surface in your case, is that a row can contain only 'a' or 'b' and your code should be resistant to absence of particular result in such a source row.另一个问题，你的情况，其表面didin't，是一个行只能包含'a'或'b'和代码应该是不存在特定结果的耐这种源排。

To make your function resistant to such cases, change it to:为了使您的函数能够抵抗这种情况，请将其更改为：

def winner(row):
    vc = row.value_counts()
    a = vc.get('a', 0)
    b = vc.get('b', 0)
    if a > b: return 'a'
    elif a < b: return 'b'
    else: return 'draw'

Then you can apply your function, but if you want to apply it to each row (not column), you should pass axis=1 .然后你可以应用你的函数，但如果你想将它应用到每一行（而不是列），你应该传递axis=1 。

So, to sum up, change your code to:所以，总而言之，将您的代码更改为：

df['winner_columns'] = df.replace('\.', '', regex=True).apply(winner, axis=1)

The result, for your sample data, is:对于您的示例数据，结果是：

  vote_1 vote_2 vote_3 vote_4 winner_columns
0     a.     a.     a.      b              a
1     b.     b.      a      b              b
2     b.     a.      a      b           draw

Answer 3

You can use .sum() for counting the votes, then you save in a list the winners, finally you add into dataframe.您可以使用.sum()计算选票，然后将获胜者保存在列表中，最后添加到数据框中。

numpy_votes = dataframe_votes.to_numpy()    
winner_columns = []
for i in numpy_votes:
  if np.sum(i == 'a') < np.sum(i == 'b'):
     winner_columns.append('b')
  elif np.sum(i == 'a') > np.sum(i == 'b'):
     winner_columns.append('a')
  else:
     winner_columns.append('draw')
    
dataframe_votes['winner_columns'] = winner_columns

Using .sum() method is the fastest way to count elements inside arrays according to this answer.根据此答案，使用 .sum() 方法是计算数组内元素的最快方法。

Output:输出：

    vote_1  vote_2  vote_3  vote_4  winner_columns
0   a        a         a        b       a
1   b        b         a        b       b
2   b        a         a        b       draw

将函数应用于数据框中的每一行

问题描述

3 个解决方案

解决方案1
1 2020-10-30 11:33:56

解决方案2
1 2020-10-30 12:19:34

解决方案3
0 2020-10-31 18:51:35

将函数应用于数据框中的每一行

问题描述

3 个解决方案

解决方案1 1 2020-10-30 11:33:56

解决方案2 1 2020-10-30 12:19:34

解决方案3 0 2020-10-31 18:51:35

解决方案1
1 2020-10-30 11:33:56

解决方案2
1 2020-10-30 12:19:34

解决方案3
0 2020-10-31 18:51:35