熊貓：如何獲得熊貓系列中最常見的項目？

Question

如何獲得pandas系列中最常用的物品？

考慮系列s

s = pd.Series("1 5 3 3 3 5 2 1 8 10 2 3 3 3".split()).astype(int)

返回值應為3

Answer 1

您可以使用pd.Series.mode並提取第一個值：

res = s.mode().iloc[0]

這不一定效率低下。 與往常一樣，測試您的數據，看看哪些適合。

import numpy as np, pandas as pd
from scipy.stats.mstats import mode
from collections import Counter

np.random.seed(0)

s = pd.Series(np.random.randint(0, 100, 100000))

def jez_np(s):
    _, idx, counts = np.unique(s, return_index=True, return_counts=True)
    index = idx[np.argmax(counts)]
    val = s[index]
    return val

def pir(s):
    i, r = s.factorize()
    return r[np.bincount(i).argmax()]

%timeit s.mode().iloc[0]                 # 1.82 ms
%timeit pir(s)                           # 2.21 ms
%timeit s.value_counts().index[0]        # 2.52 ms
%timeit mode(s).mode[0]                  # 5.64 ms
%timeit jez_np(s)                        # 8.26 ms
%timeit Counter(s).most_common(1)[0][0]  # 8.27 ms

Answer 2

使用value_counts並按index選擇第一個值：

val = s.value_counts().index[0]

或者Counter.most_common ：

from collections import Counter

val = Counter(s).most_common(1)[0][0]

或者是numpy解決方案：

_, idx, counts = np.unique(s, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
val = s[index]

Answer 3

`pandas.factorize`和`numpy.bincount`

這與@ jezrael的Numpy回答非常相似。 區別在於使用factorize而不是numpy.unique

factorize返回整數分解和唯一值
bincount計算每個唯一值的數量
argmax識別哪個bin或factor是最常用的
使用從argmax返回的bin的位置來引用唯一值數組中最常用的值

i, r = s.factorize()
r[np.bincount(i).argmax()]

3

Answer 4

from scipy import stats
import pandas as pd
x=[1,5,3,3,3,5,2,1,8,10,2,3,3,3]
data=pd.DataFrame({"values":x})


print(stats.mode(data["values"]))

output:-ModeResult(mode=array([3], dtype=int64), count=array([6]))

熊貓：如何獲得熊貓系列中最常見的項目？

問題描述

4 個解決方案

解決方案1
7 已采納 2018-08-27 12:12:18

解決方案2
5 2018-08-27 12:01:44

解決方案3
3 2018-08-27 12:58:23

`pandas.factorize`和`numpy.bincount`

解決方案4
1 2018-08-27 12:21:07

熊貓：如何獲得熊貓系列中最常見的項目？

問題描述

4 個解決方案

解決方案1 7 已采納 2018-08-27 12:12:18

解決方案2 5 2018-08-27 12:01:44

解決方案3 3 2018-08-27 12:58:23

pandas.factorize和numpy.bincount

解決方案4 1 2018-08-27 12:21:07

解決方案1
7 已采納 2018-08-27 12:12:18

解決方案2
5 2018-08-27 12:01:44

解決方案3
3 2018-08-27 12:58:23

`pandas.factorize`和`numpy.bincount`

解決方案4
1 2018-08-27 12:21:07