简体   繁体   English

如何确定每行多列中类别标签的最高出现次数

[英]How to determine highest occurrence of categorical labels across multiple columns per row

I am trying to determine the label name with the highest occurrence across multiple columns and set the another pandas columns with that label. 我正在尝试确定在多列中出现次数最多的标签名称,并用该标签设置另一个熊猫列。

For examples, given this dataframe: 例如,给定此数据框:

    Class_1     Class_2     Class_3
0   versicolor  setosa      setosa
1   virginica   versicolor  virginica
2   virginica   setosa      setosa
3   versicolor  setosa      setosa
4   versicolor  versicolor  virginica

I want to add a column called Predictions per the reasoning above: 我想根据上述原因添加一列称为“预测”:

    Class_1     Class_2     Class_3    Predictions
0   versicolor  setosa      setosa     setosa
1   virginica   versicolor  virginica  virginica
2   virginica   setosa      setosa     setosa
3   versicolor  setosa      setosa     setosa
4   versicolor  versicolor  virginica  versicolor

Use value_counts for return first index by most common value per rows with apply and axis=1 : value_counts用于返回第一个索引,按applyaxis=1每行的最常用值:

df['Predictions'] = df.apply(lambda x: x.value_counts().index[0], axis=1)
print (df)
      Class_1     Class_2    Class_3 Predictions
0  versicolor      setosa     setosa      setosa
1   virginica  versicolor  virginica   virginica
2   virginica      setosa     setosa      setosa
3  versicolor      setosa     setosa      setosa
4  versicolor  versicolor  virginica  versicolor

Alternative with Counter.most_common : Counter.most_common替代方案:

from collections import Counter

df['Predictions'] = [Counter(x).most_common(1)[0][0] for x in df.itertuples()]
print (df)
      Class_1     Class_2    Class_3 Predictions
0  versicolor      setosa     setosa      setosa
1   virginica  versicolor  virginica   virginica
2   virginica      setosa     setosa      setosa
3  versicolor      setosa     setosa      setosa
4  versicolor  versicolor  virginica  versicolor

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对给定行,pandas的列中的分类特征标签求和 - Sum categorical feature labels across columns for given row, pandas 可视化多个分类值在数据框中的行和列之间的差异 - Visualize how multiple categorical values differ across rows and columns in a dataframe 如何使用 Pandas 从多列中确定最大值 - How to determine the highest value from multiple columns using Pandas 使用分类标签初始化多个数据框列 - Initilize multiple dataframe columns with categorical labels 如何确定同一行(Python,SQLite3)中三列的最大值 - How do I determine the highest value of three columns from the same row, Python, SQLite3 获取 2 列之间的最高出现次数 - Get the highest occurrence between 2 columns 一个热编码,在python中每行有多个分类值 - one hot encoding with multiple categorical values per row in python 如何计算所有分类标签之间的分布并将总出现次数存储在列表中 - how to count distribution between all categorical labels and store total occurrence in list 如何使用分类列的二进制编码来预测 Python 中的标签? - How to use Binary Encoding of Categorical Columns to predict labels in Python? 如何汇总熊猫中几列的所有类别变量的总和 - How to sum the total of all categorical variables across several columns in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM