I have a really huge dataframe (thousends of rows), but let's assume it is like this:
A B C D E F
0 2 5 2 2 2 2
1 5 2 5 5 5 5
2 5 2 5 2 5 5
3 2 2 2 2 2 2
4 5 5 5 5 5 5
I need to see which value appears most frequently in a group of columns for each row. For instance, the value that appears most frequently in columns ABC and in columns DEF in each row, and put them in another column. In this example, my expected output is
ABC DEF
2 2
5 5
5 5
2 2
5 5
How can I do it in Python??? Thanks!!
Here is one way using columns groupby
mapperd={'A':'ABC','B':'ABC','C':'ABC','D':'DEF','E':'DEF','F':'DEF'}
df.groupby(mapperd,axis=1).agg(lambda x : x.mode()[0])
Out[826]:
ABC DEF
0 2 2
1 5 5
2 5 5
3 2 2
4 5 5
For a good performance you can work with the underlying numpy arrays, and use scipy.stats.mode
to compute the mode :
from scipy import stats
cols = ['ABC','DEF']
a = df.values.reshape(-1, df.shape[1]//2)
pd.DataFrame(stats.mode(a, axis=1).mode.reshape(-1,2), columns=cols)
ABC DEF
0 2 2
1 5 5
2 5 5
3 2 2
4 5 5
You try using column header index filtering:
grp = ['ABC','DEF']
pd.concat([df.loc[:,[*g]].mode(1).set_axis([g], axis=1, inplace=False) for g in grp], axis=1)
Output:
ABC DEF
0 2 2
1 5 5
2 5 5
3 2 2
4 5 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.