如何在python pandas中獲得每列n個最頻繁或最高的值？

Question

我的數據框看起來像：

df:
    A   B
0   a   g
1   f   g
2   a   g
3   a   d
4   h   d
5   f   a

對於每列前 2 個最頻繁的值 (n=2)，輸出應為：

top_df:
    A   B
0   a   g
1   f   d

謝謝

Answer 1

這應該工作

n = 2
df.apply(lambda x: pd.Series(x.value_counts().index[:n]))

Answer 2

喜歡的東西，這可以幫助

maxes = dict()
for col in df.columns:
    frequencies = df[col].value_counts()
    # value counts automatically sorts, so just take the first 2
    max[col] = frequencies[:2]

Answer 3

解決方案：
要獲得n最頻繁的值，只需使用.value_counts()子集並獲取索引：

import pandas as pd

df = pd.read_csv('test.csv')

# METHOD 1 : Lil lengthy and inefficient
top_dict = {}
n_freq_items = 2
top_dict['A'] = df.A.value_counts()[:n_freq_items].index.tolist()
top_dict['B'] = df.B.value_counts()[:n_freq_items].index.tolist()
top_df = pd.DataFrame(top_dict)

print(top_df)
df.apply(lambda x: pd.Series(x.value_counts()[:n_freq_items].index))

# METHOD 2 : Small, and better : taking this method from @myccha. As I found this better
top_df = df.apply(lambda x: pd.Series(x.value_counts()[:n_freq_items].index))
print(top_df)

輸入數據：

# test.csv
A,B
a,g
f,g
a,g
a,d
h,d
f,a

輸出：

   A  B
0  a  g
1  f  d

注意：我從@myccha那里得到了解決方案，這是這篇文章的另一個答案，因為我發現他的答案更有幫助，將其添加為方法 2。

如何在python pandas中獲得每列n個最頻繁或最高的值？

問題描述

3 個解決方案

解決方案1
1 已采納 2020-10-20 12:53:23

解決方案2
0 2020-10-20 12:53:47

解決方案3
0 2020-10-20 12:57:45

如何在python pandas中獲得每列n個最頻繁或最高的值？

問題描述

3 個解決方案

解決方案1 1 已采納 2020-10-20 12:53:23

解決方案2 0 2020-10-20 12:53:47

解決方案3 0 2020-10-20 12:57:45

解決方案1
1 已采納 2020-10-20 12:53:23

解決方案2
0 2020-10-20 12:53:47

解決方案3
0 2020-10-20 12:57:45