简体   繁体   English

从 Pandas 中的 dataframe 行中查找最常见的值

[英]Finding most frequent value from dataframe rows in Pandas

In a data frame, I want to create another column which is outputs the most frequent value coming from different columns in a row.在数据框中,我想创建另一列,该列输出来自一行中不同列的最频繁值。

A    B    C   D
foo  bar  baz foo
egg  bacon egg egg
bacon egg foo  baz

The "E" column must output frequent value from a row like “E”列必须是 output 频繁值从一行像

E
foo
egg

How can I do it in Python?如何在 Python 中做到这一点?

Recreating your problem with:重现您的问题:

df = pd.DataFrame(
    {
        'A' : ['foo', 'egg', 'bacon'], 
        'B' : ['bar', 'bacon', 'egg'],
        'C' : ['baz', 'egg', 'foo'],
        'D' : ['foo', 'egg', 'baz']
    }
)

And solving the problem with并解决问题

df['E'] = df.mode(axis=1)[0]

Output: Output:

    A      B       C       D       E
0   foo    bar     baz     foo     foo
1   egg    bacon   egg     egg     egg
2   bacon  egg     foo     baz     bacon

What happens if there is no single most frequent element?如果没有一个最频繁的元素会发生什么?

df.mode(axis=1)
    0      1       2       3
0   foo    NaN     NaN     NaN
1   egg    NaN     NaN     NaN
2   bacon  baz     egg     foo

As you can see when there is a tie on being most frequent it returns the values in the most frequent set.正如您所看到的,当出现最频繁时,它会返回最频繁集中的值。 If I swap the values foo for egg and baz for bacon in columns C and D, respectively, we get the following result:如果我分别在 C 和 D 列中将值 foo 换成鸡蛋,将 baz 换成培根,我们会得到以下结果:

    0      1
0   foo    NaN
1   egg    NaN
2   bacon  egg

As you can see, now the result set is only two elements, which means that the tie is between bacon and egg.如您所见,现在结果集只有两个元素,这意味着平局在培根和鸡蛋之间。

How do I detect ties?如何检测关系?

Let us work with the dataset not containing the column D.让我们使用不包含 D 列的数据集。

df
    A      B       C
0   foo    bar     baz
1   egg    bacon   egg
2   bacon  egg     foo

df_m = df.mode(axis=1)
df_m
    0      1    2
0   bar    baz  foo
1   egg    NaN  NaN
2   bacon  egg  foo

df['D'] = df_m[0]
    A      B       C    D
0   foo    bar     baz  bar
1   egg    bacon   egg  egg
2   bacon  egg     foo  bacon

We can utilize the notna() method which pandas provide to create a mask to check which rows are not containing a NaN value, ie which rows are in a tie.我们可以利用 pandas 提供的notna()方法来创建掩码来检查哪些行不包含 NaN 值,即哪些行处于平局。

First, we must drop the first column which always has a value.首先,我们必须删除始终具有值的第一列。

df_m = df_m.drop(columns=0)

Then we need to transform the dataframe using another method .T , and check for any rows not containing NaNs.然后我们需要使用另一种方法.T转换 dataframe ,并检查任何不包含 NaN 的行。

df_mask = df_m.T.notna().any()
df_mask
0    False
1    False
2     True
dtype: bool

Now we have a pandas series of booleans.现在我们有一个 pandas 系列布尔值。 We can use this mask to overwrite the column from before.我们可以使用这个掩码覆盖之前的列。

df['D'][df_mask] = df['A'][df_mask] 
    A      B       C    D
0   foo    bar     baz  foo
1   egg    bacon   egg  egg
2   bacon  egg     foo  bacon

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 DataFrame 中找到最频繁的组合 - Finding the most frequent combination in DataFrame Python,Pandas; 按列中最常见的值对 Dataframe 行进行排序 - Python, Pandas; Sort Dataframe rows by most frequent values in a column 在Spark数据帧的n列中按行查找最频繁的值 - Finding the most frequent value by row among n columns in a Spark dataframe 在 pandas 的列中查找 5 个最常见值的 ID 名称 - finding id name of 5 most frequent value in a column in pandas 用Pandas Dataframe中最频繁的值替换行值 - Replacing a row value with the most frequent value in Pandas Dataframe pandas:在每个组中使用最频繁值的(多索引)DataFrame 上执行 fillna() 的最佳方法是什么? - pandas: What is the best way to do fillna() on a (multiindexed) DataFrame with the most frequent value from every group? 使用 Pandas 在 Dataframe 中逐行获取最频繁值的问题 - Problem in getting the most frequent value row-wise in a Dataframe with Pandas 最常用的值是使用pandas.DataFrame.resample - Most frequent value using pandas.DataFrame.resample 根据熊猫数据框中的前百分位数据获取最频繁的值和标准差 - Get most frequent value and std based on precentile data in a pandas dataframe 在熊猫数据框中查找频繁的用户 - Finding frequent users in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM