[英]How to determine highest occurrence of categorical labels across multiple columns per row
I am trying to determine the label name with the highest occurrence across multiple columns and set the another pandas columns with that label. 我正在尝试确定在多列中出现次数最多的标签名称,并用该标签设置另一个熊猫列。
For examples, given this dataframe: 例如,给定此数据框:
Class_1 Class_2 Class_3
0 versicolor setosa setosa
1 virginica versicolor virginica
2 virginica setosa setosa
3 versicolor setosa setosa
4 versicolor versicolor virginica
I want to add a column called Predictions per the reasoning above: 我想根据上述原因添加一列称为“预测”:
Class_1 Class_2 Class_3 Predictions
0 versicolor setosa setosa setosa
1 virginica versicolor virginica virginica
2 virginica setosa setosa setosa
3 versicolor setosa setosa setosa
4 versicolor versicolor virginica versicolor
Use value_counts
for return first index by most common value per rows with apply
and axis=1
: 将
value_counts
用于返回第一个索引,按apply
和axis=1
每行的最常用值:
df['Predictions'] = df.apply(lambda x: x.value_counts().index[0], axis=1)
print (df)
Class_1 Class_2 Class_3 Predictions
0 versicolor setosa setosa setosa
1 virginica versicolor virginica virginica
2 virginica setosa setosa setosa
3 versicolor setosa setosa setosa
4 versicolor versicolor virginica versicolor
Alternative with Counter.most_common
: Counter.most_common
替代方案:
from collections import Counter
df['Predictions'] = [Counter(x).most_common(1)[0][0] for x in df.itertuples()]
print (df)
Class_1 Class_2 Class_3 Predictions
0 versicolor setosa setosa setosa
1 virginica versicolor virginica virginica
2 virginica setosa setosa setosa
3 versicolor setosa setosa setosa
4 versicolor versicolor virginica versicolor
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.