简体   繁体   English

比较每行的数据框列中的元素 - Python

[英]Compare elements in dataframe columns for each row - Python

I have a really huge dataframe (thousends of rows), but let's assume it is like this: 我有一个非常庞大的数据帧(行数),但让我们假设它是这样的:

   A  B  C  D  E  F
0  2  5  2  2  2  2
1  5  2  5  5  5  5
2  5  2  5  2  5  5
3  2  2  2  2  2  2
4  5  5  5  5  5  5

I need to see which value appears most frequently in a group of columns for each row. 我需要查看哪一个值最常出现在每行的一组列中。 For instance, the value that appears most frequently in columns ABC and in columns DEF in each row, and put them in another column. 例如,最常出现在每行的ABC列和DEF列中的值,并将它们放在另一列中。 In this example, my expected output is 在这个例子中,我的预期输出是

ABC  DEF  
 2    2     
 5    5     
 5    5     
 2    2     
 5    5     

How can I do it in Python??? 我怎么能用Python做呢??? Thanks!! 谢谢!!

Here is one way using columns groupby 这是使用列groupby一种方法

mapperd={'A':'ABC','B':'ABC','C':'ABC','D':'DEF','E':'DEF','F':'DEF'}
df.groupby(mapperd,axis=1).agg(lambda x : x.mode()[0])
Out[826]: 
   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

For a good performance you can work with the underlying numpy arrays, and use scipy.stats.mode to compute the mode : 为了获得良好的性能,您可以使用底层的numpy数组,并使用scipy.stats.mode来计算模式

from scipy import stats
cols = ['ABC','DEF']
a = df.values.reshape(-1, df.shape[1]//2)
pd.DataFrame(stats.mode(a, axis=1).mode.reshape(-1,2), columns=cols)

    ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

You try using column header index filtering: 您尝试使用列标题索引筛选:

grp = ['ABC','DEF']
pd.concat([df.loc[:,[*g]].mode(1).set_axis([g], axis=1, inplace=False) for g in grp], axis=1)

Output: 输出:

   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM