[英]Compare elements in dataframe columns for each row - Python
I have a really huge dataframe (thousends of rows), but let's assume it is like this: 我有一个非常庞大的数据帧(行数),但让我们假设它是这样的:
A B C D E F
0 2 5 2 2 2 2
1 5 2 5 5 5 5
2 5 2 5 2 5 5
3 2 2 2 2 2 2
4 5 5 5 5 5 5
I need to see which value appears most frequently in a group of columns for each row. 我需要查看哪一个值最常出现在每行的一组列中。 For instance, the value that appears most frequently in columns ABC and in columns DEF in each row, and put them in another column.
例如,最常出现在每行的ABC列和DEF列中的值,并将它们放在另一列中。 In this example, my expected output is
在这个例子中,我的预期输出是
ABC DEF
2 2
5 5
5 5
2 2
5 5
How can I do it in Python??? 我怎么能用Python做呢??? Thanks!!
谢谢!!
Here is one way using columns groupby
这是使用列
groupby
一种方法
mapperd={'A':'ABC','B':'ABC','C':'ABC','D':'DEF','E':'DEF','F':'DEF'}
df.groupby(mapperd,axis=1).agg(lambda x : x.mode()[0])
Out[826]:
ABC DEF
0 2 2
1 5 5
2 5 5
3 2 2
4 5 5
For a good performance you can work with the underlying numpy arrays, and use scipy.stats.mode
to compute the mode : 为了获得良好的性能,您可以使用底层的numpy数组,并使用
scipy.stats.mode
来计算模式 :
from scipy import stats
cols = ['ABC','DEF']
a = df.values.reshape(-1, df.shape[1]//2)
pd.DataFrame(stats.mode(a, axis=1).mode.reshape(-1,2), columns=cols)
ABC DEF
0 2 2
1 5 5
2 5 5
3 2 2
4 5 5
You try using column header index filtering: 您尝试使用列标题索引筛选:
grp = ['ABC','DEF']
pd.concat([df.loc[:,[*g]].mode(1).set_axis([g], axis=1, inplace=False) for g in grp], axis=1)
Output: 输出:
ABC DEF
0 2 2
1 5 5
2 5 5
3 2 2
4 5 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.