比较每行的数据框列中的元素 - Python

Question

I have a really huge dataframe (thousends of rows), but let's assume it is like this: 我有一个非常庞大的数据帧（行数），但让我们假设它是这样的：

   A  B  C  D  E  F
0  2  5  2  2  2  2
1  5  2  5  5  5  5
2  5  2  5  2  5  5
3  2  2  2  2  2  2
4  5  5  5  5  5  5

I need to see which value appears most frequently in a group of columns for each row. 我需要查看哪一个值最常出现在每行的一组列中。 For instance, the value that appears most frequently in columns ABC and in columns DEF in each row, and put them in another column. 例如，最常出现在每行的ABC列和DEF列中的值，并将它们放在另一列中。 In this example, my expected output is 在这个例子中，我的预期输出是

How can I do it in Python??? 我怎么能用Python做呢??? Thanks!! 谢谢！！

Answer 1

Here is one way using columns groupby 这是使用列groupby一种方法

mapperd={'A':'ABC','B':'ABC','C':'ABC','D':'DEF','E':'DEF','F':'DEF'}
df.groupby(mapperd,axis=1).agg(lambda x : x.mode()[0])
Out[826]: 
   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

Answer 2

For a good performance you can work with the underlying numpy arrays, and use scipy.stats.mode to compute the mode : 为了获得良好的性能，您可以使用底层的numpy数组，并使用scipy.stats.mode来计算模式：

from scipy import stats
cols = ['ABC','DEF']
a = df.values.reshape(-1, df.shape[1]//2)
pd.DataFrame(stats.mode(a, axis=1).mode.reshape(-1,2), columns=cols)

    ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

Answer 3

You try using column header index filtering: 您尝试使用列标题索引筛选：

grp = ['ABC','DEF']
pd.concat([df.loc[:,[*g]].mode(1).set_axis([g], axis=1, inplace=False) for g in grp], axis=1)

Output: 输出：

   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

比较每行的数据框列中的元素 - Python

问题描述

3 个解决方案

解决方案1
8 2019-04-30 17:45:03

解决方案2
4 2019-04-30 17:46:23

解决方案3
3 2019-04-30 17:58:30

比较每行的数据框列中的元素 - Python

问题描述

3 个解决方案

解决方案1 8 2019-04-30 17:45:03

解决方案2 4 2019-04-30 17:46:23

解决方案3 3 2019-04-30 17:58:30

解决方案1
8 2019-04-30 17:45:03

解决方案2
4 2019-04-30 17:46:23

解决方案3
3 2019-04-30 17:58:30