简体   繁体   English

将函数应用于数据框中的每一行

[英]Apply function to each row in a dataframe

I am trying to apply the following function for each row in a dataframe.我正在尝试为数据框中的每一行应用以下函数。 The dataframe looks as follows:数据框如下所示:

vote_1 vote_2 vote_3 vote_4
a      a       a      b           
b      b       a      b          
b      a       a      b           

I am tring to generate a fourth column to sum the 'votes' of the other columns and produce the winner, as follows:我试图生成第四列来总结其他列的“投票”并产生获胜者,如下所示:

vote_1 vote_2 vote_3 vote_4 winner_columns
a      a       a      b           a
b      b       a      b           b 
b      a       a      b           draw

I have currently tried:我目前尝试过:

def winner(x):
    a = new_df.iloc[x].value_counts()['a']
    b = new_df.iloc[x].value_counts()['b']
    if a > b:
        y = 'a'
    elif a < b:
        y = 'b'
    else:
        y = 'draw'
    return y

df['winner_columns'].apply(winner)

However the whole column gets filled with draws.然而,整列都充满了平局。 I assume is something with the way I have build the function but can't figure out what我认为这与我构建函数的方式有关,但无法弄清楚是什么

You can use DataFrame.mode and count non missing values by DataFrame.count , if only one use first column else draw in numpy.where :您可以使用DataFrame.mode并通过DataFrame.count计算非缺失值,如果只有一个使用第一列,否则在numpy.where draw

df1 = df.mode(axis=1)
print (df1)
   0    1
0  a  NaN
1  b  NaN
2  a    b

df['winner_columns'] = np.where(df1.count(axis=1).eq(1), df1[0], 'draw')
print (df)
  vote_1 vote_2 vote_3 vote_4 winner_columns
0      a      a      a      b              a
1      b      b      a      b              b
2      b      a      a      b           draw

Your solution is possible change:您的解决方案可能会发生变化:

def winner(x):
    s = x.value_counts()
    a = s['a']
    b = s['b']
    if a > b:
        y = 'a'
    elif a < b:
        y = 'b'
    else:
        y = 'draw'
    return y

df['winner_columns'] = df.apply(winner,axis=1)
print (df)
  vote_1 vote_2 vote_3 vote_4 winner_columns
0      a      a      a      b              a
1      b      b      a      b              b
2      b      a      a      b           draw

The first problem is that your DataFrame contains sometimes a letter followed by a dot.第一个问题是您的 DataFrame 有时包含一个字母后跟一个点。

So to look for solely 'a' or 'b' you have to replace these dots with an empty string, something like:因此,要仅查找'a''b'您必须用空字符串替换这些点,例如:

df.replace('\.', '', regex=True)

Another problem, which didin't surface in your case, is that a row can contain only 'a' or 'b' and your code should be resistant to absence of particular result in such a source row.另一个问题,你的情况,其表面didin't,是一个行只能包含'a''b'和代码应该是不存在特定结果的耐这种源排。

To make your function resistant to such cases, change it to:为了使您的函数能够抵抗这种情况,请将其更改为:

def winner(row):
    vc = row.value_counts()
    a = vc.get('a', 0)
    b = vc.get('b', 0)
    if a > b: return 'a'
    elif a < b: return 'b'
    else: return 'draw'

Then you can apply your function, but if you want to apply it to each row (not column), you should pass axis=1 .然后你可以应用你的函数,但如果你想将它应用到每一(而不是列),你应该传递axis=1

So, to sum up, change your code to:所以,总而言之,将您的代码更改为:

df['winner_columns'] = df.replace('\.', '', regex=True).apply(winner, axis=1)

The result, for your sample data, is:对于您的示例数据,结果是:

  vote_1 vote_2 vote_3 vote_4 winner_columns
0     a.     a.     a.      b              a
1     b.     b.      a      b              b
2     b.     a.      a      b           draw

You can use .sum() for counting the votes, then you save in a list the winners, finally you add into dataframe.您可以使用.sum()计算选票,然后将获胜者保存在列表中,最后添加到数据框中。

numpy_votes = dataframe_votes.to_numpy()    
winner_columns = []
for i in numpy_votes:
  if np.sum(i == 'a') < np.sum(i == 'b'):
     winner_columns.append('b')
  elif np.sum(i == 'a') > np.sum(i == 'b'):
     winner_columns.append('a')
  else:
     winner_columns.append('draw')
    
dataframe_votes['winner_columns'] = winner_columns

Using .sum() method is the fastest way to count elements inside arrays according to this answer.根据答案,使用 .sum() 方法是计算数组内元素的最快方法。

Output:输出:

    vote_1  vote_2  vote_3  vote_4  winner_columns
0   a        a         a        b       a
1   b        b         a        b       b
2   b        a         a        b       draw

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将函数应用于数据帧的每一行 - apply a function to each row of the dataframe Python将函数应用于DataFrame的每一行 - Python apply function to each row of DataFrame 按组将函数应用于 Pandas 数据框中的每一行 - Apply function to each row in Pandas dataframe by group 如何将返回 dataframe 的 function 应用到另一个 dataframe 的每一行 - how to apply a function that returns a dataframe to each row of another dataframe 用于将函数应用于 Pandas DataFrame 中的每一行的应用函数的替代方法 - Alternative to apply function for applying a function to each row in Pandas DataFrame 如何将函数应用于数据框中的每一行并获得一系列指令? - how to apply a function to each row in a dataframe and get a series of dicts? 将函数应用于pandas数据帧的每一行以创建两个新列 - Apply function to each row of pandas dataframe to create two new columns 将自定义函数应用于pandas数据框中的每一行的更快方法? - Faster way to apply custom function to each row in pandas dataframe? 在没有 for 循环的情况下,将包含 if 的函数应用于 pandas 中数据帧的每一行 - Apply a function including if to each row of a dataframe in pandas without for loop 如何将 function 应用于 pandas dataframe 中一列的每一行? - How to apply a function to each row of one column in a pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM