簡體   English   中英

將 pandas 系列分組到箱中

[英]Grouping pandas series into bins

我有以下Pandas 系列:

Asia           China                 19.7549
               Japan                 10.2328
               India                 14.9691
               South Korea           2.27935
               Iran                  5.70772
North America  United States          11.571
               Canada                61.9454
Europe         United Kingdom        10.6005
               Russian Federation    17.2887
               Germany               17.9015
               France                17.0203
               Italy                 33.6672
               Spain                 37.9686
Australia      Australia             11.8108
South America  Brazil                 69.648
Name: % Renewable, dtype: object

我已將此數據分箱為 5 個箱:

binning = pd.cut(Reducedset['% Renewable'],5)

然后我想計算每個垃圾箱中的國家數量

df.groupby(binning)['% Renewable'].agg(['count'])

因此,最終的 dataframe 應該只有“大陸”作為索引,而不是國家。

但是,這個公式不起作用。

我目前的 output 是這樣的:

                     count
binning                
(2.212, 15.753]       7
(15.753, 29.227]      4
(29.227, 42.701]      2
(56.174, 69.648]      2

我想在此處顯示“大陸”的索引...

有人能幫我一把嗎?

確保您不會犯愚蠢的錯誤,例如為數據框使用不正確的名稱:

Reducedset.groupby(binning)['% Renewable'].agg(['count'])

據我了解,您有:

  • 一個名為ReducedsetDataFrame (不是系列),
  • 有一個名為% Renewable的列,
  • 具有 2 級 MultiIndex(大洲國家)。

由於稍后將需要對各個行進行分箱,即使在索引中進行了一些更改之后,最好將分箱保存為另一列:

Reducedset['binning'] = pd.cut(Reducedset['% Renewable'], 5)

結果是:

                                  % Renewable           binning
continents    countries                                        
Asia          China                  19.75490  (15.753, 29.227]
              Japan                  10.23280   (2.212, 15.753]
              India                  14.96910   (2.212, 15.753]
              South Korea             2.27935   (2.212, 15.753]
              Iran                    5.70772   (2.212, 15.753]
North America United States          11.57100   (2.212, 15.753]
              Canada                 61.94540  (56.174, 69.648]
Europe        United Kingdom         10.60050   (2.212, 15.753]
              Russian Federation     17.28870  (15.753, 29.227]
              Germany                17.90150  (15.753, 29.227]
              France                 17.02030  (15.753, 29.227]
              Italy                  33.66720  (29.227, 42.701]
              Spain                  37.96860  (29.227, 42.701]
Australia     Australia              11.81080   (2.212, 15.753]
South America Brazil                 69.64800  (56.174, 69.648]

如果您只想在索引中包含大洲,您可以運行:

Reducedset.reset_index('countries', inplace=True)

可以打印出來,按binning排序,結果是:

                        countries  % Renewable           binning
continents                                                      
Asia                        Japan     10.23280   (2.212, 15.753]
Asia                        India     14.96910   (2.212, 15.753]
Asia                  South Korea      2.27935   (2.212, 15.753]
Asia                         Iran      5.70772   (2.212, 15.753]
North America       United States     11.57100   (2.212, 15.753]
Europe             United Kingdom     10.60050   (2.212, 15.753]
Australia               Australia     11.81080   (2.212, 15.753]
Asia                        China     19.75490  (15.753, 29.227]
Europe         Russian Federation     17.28870  (15.753, 29.227]
Europe                    Germany     17.90150  (15.753, 29.227]
Europe                     France     17.02030  (15.753, 29.227]
Europe                      Italy     33.66720  (29.227, 42.701]
Europe                      Spain     37.96860  (29.227, 42.701]
North America              Canada     61.94540  (56.174, 69.648]
South America              Brazil     69.64800  (56.174, 69.648]

如您所見,在(2.212, 15.753] bin 中有來自4大洲的國家/地區,因此仍然需要有關國家/地區的信息(盡管您可以將其作為“常規”列)。

現在您還可以執行聚合,但稍作更改:

Reducedset.groupby('binning')['% Renewable'].agg(['count'])

(注意Reducedset而不是dfbinning周圍的撇號,因為它現在是 DataFrame 中的一)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM