[英]Grouping pandas series into bins
我有以下Pandas 系列:
Asia China 19.7549
Japan 10.2328
India 14.9691
South Korea 2.27935
Iran 5.70772
North America United States 11.571
Canada 61.9454
Europe United Kingdom 10.6005
Russian Federation 17.2887
Germany 17.9015
France 17.0203
Italy 33.6672
Spain 37.9686
Australia Australia 11.8108
South America Brazil 69.648
Name: % Renewable, dtype: object
我已將此數據分箱為 5 個箱:
binning = pd.cut(Reducedset['% Renewable'],5)
然后我想計算每個垃圾箱中的國家數量:
df.groupby(binning)['% Renewable'].agg(['count'])
因此,最終的 dataframe 應該只有“大陸”作為索引,而不是國家。
但是,這個公式不起作用。
我目前的 output 是這樣的:
count
binning
(2.212, 15.753] 7
(15.753, 29.227] 4
(29.227, 42.701] 2
(56.174, 69.648] 2
我想在此處顯示“大陸”的索引...
有人能幫我一把嗎?
確保您不會犯愚蠢的錯誤,例如為數據框使用不正確的名稱:
Reducedset.groupby(binning)['% Renewable'].agg(['count'])
據我了解,您有:
由於稍后將需要對各個行進行分箱,即使在索引中進行了一些更改之后,最好將分箱保存為另一列:
Reducedset['binning'] = pd.cut(Reducedset['% Renewable'], 5)
結果是:
% Renewable binning
continents countries
Asia China 19.75490 (15.753, 29.227]
Japan 10.23280 (2.212, 15.753]
India 14.96910 (2.212, 15.753]
South Korea 2.27935 (2.212, 15.753]
Iran 5.70772 (2.212, 15.753]
North America United States 11.57100 (2.212, 15.753]
Canada 61.94540 (56.174, 69.648]
Europe United Kingdom 10.60050 (2.212, 15.753]
Russian Federation 17.28870 (15.753, 29.227]
Germany 17.90150 (15.753, 29.227]
France 17.02030 (15.753, 29.227]
Italy 33.66720 (29.227, 42.701]
Spain 37.96860 (29.227, 42.701]
Australia Australia 11.81080 (2.212, 15.753]
South America Brazil 69.64800 (56.174, 69.648]
如果您只想在索引中包含大洲,您可以運行:
Reducedset.reset_index('countries', inplace=True)
可以打印出來,按binning排序,結果是:
countries % Renewable binning
continents
Asia Japan 10.23280 (2.212, 15.753]
Asia India 14.96910 (2.212, 15.753]
Asia South Korea 2.27935 (2.212, 15.753]
Asia Iran 5.70772 (2.212, 15.753]
North America United States 11.57100 (2.212, 15.753]
Europe United Kingdom 10.60050 (2.212, 15.753]
Australia Australia 11.81080 (2.212, 15.753]
Asia China 19.75490 (15.753, 29.227]
Europe Russian Federation 17.28870 (15.753, 29.227]
Europe Germany 17.90150 (15.753, 29.227]
Europe France 17.02030 (15.753, 29.227]
Europe Italy 33.66720 (29.227, 42.701]
Europe Spain 37.96860 (29.227, 42.701]
North America Canada 61.94540 (56.174, 69.648]
South America Brazil 69.64800 (56.174, 69.648]
如您所見,在(2.212, 15.753] bin 中有來自4大洲的國家/地區,因此仍然需要有關國家/地區的信息(盡管您可以將其作為“常規”列)。
現在您還可以執行聚合,但稍作更改:
Reducedset.groupby('binning')['% Renewable'].agg(['count'])
(注意Reducedset而不是df和binning周圍的撇號,因為它現在是 DataFrame 中的一列)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.