简体   繁体   English

对于每一行获取最频繁值的频率

[英]for each row get frequency of the most frequent value

I have a dataframe that lookS like this:我有一个看起来像这样的数据框:

var1  var2   var3  var4
a      a      a    b
c      c      b    d
e      e      f    g 
g      a      a    z
g      a      a    g
w      w      w    w

what I want to do is to identify the most frequent value for each row and count the number of times it appears, in this case I'd get我想要做的是确定每一行最频繁的值并计算它出现的次数,在这种情况下我会得到

var1  var2   var3  var4  frq
a      a      a    b      3
c      c      b    d      2
e      e      f    g      2
g      a      a    z      2
g      a      a    g      2
w      w      w    w      4

I was thinking to use something like pd.get_dummies but there would be too many dummies as each var1, var2 etc may assume quite a few different values我正在考虑使用 pd.get_dummies 之类的东西,但是由于每个 var1、var2 等可能会假设很多不同的值,所以会有太多的假人

在转置数据帧上尝试pd.value_countsmax()函数:

df["frq"] = df.T.apply(pd.value_counts).max()

Another way is to apply with axis=1:另一种方法是应用axis=1:

df['frq'] = df.apply(lambda x: x.value_counts().iloc[0], axis=1)

Or use stack and groupby :或者使用stackgroupby

df['frq'] = df.stack().groupby(level=0).value_counts().max(level=0)

You can use df.mode here.您可以在此处使用df.mode One thing to note is mode finds if there are multiples values.需要注意的一件事是mode查找是否有多个值。

From pandas-doc df.mode来自熊猫文档df.mode

The mode of a set of values is the value that appears most often.一组值的众数是最常出现的值。 It can be multiple values.它可以是多个值。

df['frq'] = df.eq(df.mode(axis=1)[0], axis=0).sum(1)

 var1 var2 var3 var4  frq
0    a    a    a    b    3
1    c    c    b    d    2
2    e    e    f    g    2
3    g    a    a    z    2
4    g    a    a    g    2
5    w    w    w    w    4

Here is a sample.这是一个示例。 I used value_counts and applied it to all rows.我使用了 value_counts 并将其应用于所有行。 I got the count of all variable and then I just choose the maximum value to get the exact result your looking for:我得到了所有变量的计数,然后我只选择最大值来获得您要查找的确切结果:

import pandas as pd

df = pd.DataFrame({'var1': ["a","c","e","g","g","w"],
                   'var2': ["a","c","e","a","a","w"],
                   'var3': ["a","b","f","a","a","w"],
                   'var4': ["b","d","g","z","g","w"]})

frequency = df.apply(pd.value_counts, axis=1).max(axis=1)

df["frq"] = frequency

print(df)

Let us try让我们试试

from scipy import stats
stats.mode(df.values.T)[1]
Out[143]: array([[3, 2, 2, 2, 2, 4]])
#df["frq"] =  stats.mode(df.values.T)[1][0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM