[英]pandas count values in each column of a dataframe
我正在尋找一種方法來計算一列中的值的數量,並證明它比我原先想象的更棘手。
Percentile Percentile1 Percentile2 Percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
我正在嘗試創建類似於以下輸出的內容。 它需要四個選項並按列計算。 它本質上是每列的pd.value.counts。 任何幫助肯定會受到贊賞。
Percentile Percentile1 Percentile2 Percentile3
mediocre: 5 2 3 4
contender: 0 1 2 1
bad: 0 1 0 0
none: 0 1 0 0
它有助於使您的數據首先“整潔” (PDF) 。 這意味着列應代表變量,行應代表觀察。
In [98]: df
Out[98]:
Percentile Percentile1 Percentile2 Percentile3
0 mediocre contender contender mediocre
69 mediocre bad mediocre mediocre
117 mediocre mediocre mediocre mediocre
144 mediocre none mediocre contender
171 mediocre mediocre contender mediocre
[5 rows x 4 columns]
在這種情況下, 融化 DataFrame使其變得整潔:
In [125]: melted = pd.melt(df); melted
Out[125]:
variable value
0 Percentile mediocre
1 Percentile mediocre
2 Percentile mediocre
3 Percentile mediocre
4 Percentile mediocre
5 Percentile1 contender
6 Percentile1 bad
7 Percentile1 mediocre
8 Percentile1 none
9 Percentile1 mediocre
10 Percentile2 contender
11 Percentile2 mediocre
12 Percentile2 mediocre
13 Percentile2 mediocre
14 Percentile2 contender
15 Percentile3 mediocre
16 Percentile3 mediocre
17 Percentile3 mediocre
18 Percentile3 contender
19 Percentile3 mediocre
[20 rows x 2 columns]
然后使用交叉表制作頻率表:
In [127]: pd.crosstab(index=[melted['value']], columns=[melted['variable']])
Out[127]:
variable Percentile Percentile1 Percentile2 Percentile3
value
bad 0 1 0 0
contender 0 1 2 1
mediocre 5 2 3 4
none 0 1 0 0
[4 rows x 4 columns]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.