簡體   English   中英

將Pandas DataFrame轉換為bin頻率

[英]Convert a Pandas DataFrame to bin frequencies

使用熊貓,我知道如何對一列進行裝箱,但是我正在努力弄清楚如何進行多列裝箱,然后找到箱的計數(頻率),因為我的數據框有20列。 我知道我可以在單列中使用20次,但是我有興趣學習一種更好的新方法。 這是數據框的標題,其中包含4列,分別顯示:

      Percentile1 Percentile2 Percentile3   Percentile4
395     0.166667    0.266667    0.266667    0.133333
424     0.266667    0.266667    0.133333    0.032258
511     0.032258    0.129032    0.129032    0.387097
540     0.129032    0.129032    0.387097    0.612903
570     0.129032    0.387097    0.612903    0.741935

我創建了以下bin數組

output = ['0-10','10-20','20-30','30-40','40-50','50-60','60-70','70-80','80-90','90-100']

這是我想要的輸出:

      Percentile1 Percentile2 Percentile3   Percentile4
395     10-20        20-30      20-30           10-20
424     20-30        20-30      10-20           0-10
511     0-10         10-20      10-20           30-40
540     10-20        10-20      30-40           60-70
570     10-20        30-40      60-70           70-80

在此之后,理想情況下,我將對頻率/值進行計數以獲得類似以下內容:

      Percentile1 Percentile2 Percentile3   Percentile4
0-10    frequency #'s        
10-20   
20-30   
30-40   
40-50   
etc...

任何幫助將不勝感激

我可能會執行以下操作:

print df

   Percentile1  Percentile2  Percentile3  Percentile4
0     0.166667     0.266667     0.266667     0.133333
1     0.266667     0.266667     0.133333     0.032258
2     0.032258     0.129032     0.129032     0.387097
3     0.129032     0.129032     0.387097     0.612903
4     0.129032     0.387097     0.612903     0.741935

現在使用apply and cut創建一個新的數據框,將百分位數替換為其所在的十分位格(應用遍歷每列):

bins = xrange(0,110,10)
new = df.apply(lambda x: pd.Series(pd.cut(x*100,bins)))
print new

  Percentile1 Percentile2 Percentile3 Percentile4
0    (10, 20]    (20, 30]    (20, 30]    (10, 20]
1    (20, 30]    (20, 30]    (10, 20]     (0, 10]
2     (0, 10]    (10, 20]    (10, 20]    (30, 40]
3    (10, 20]    (10, 20]    (30, 40]    (60, 70]
4    (10, 20]    (30, 40]    (60, 70]    (70, 80]

再次使用Apply獲取頻率計數:

print new.apply(lambda x: x.value_counts()/x.count())

         Percentile1  Percentile2  Percentile3  Percentile4
(0, 10]           0.2          NaN          NaN          0.2
(10, 20]          0.6          0.4          0.4          0.2
(20, 30]          0.2          0.4          0.2          NaN
(30, 40]          NaN          0.2          0.2          0.2
(60, 70]          NaN          NaN          0.2          0.2
(70, 80]          NaN          NaN          NaN          0.2

或值計數:

print new.apply(lambda x: x.value_counts())

          Percentile1  Percentile2  Percentile3  Percentile4
(0, 10]             1          NaN          NaN            1
(10, 20]            3            2            2            1
(20, 30]            1            2            1          NaN
(30, 40]          NaN            1            1            1
(60, 70]          NaN          NaN            1            1
(70, 80]          NaN          NaN          NaN            1

另一種方法不是創建中間數據幀(我稱之為new ),而是直接在一個命令中進行值計數:

print df.apply(lambda x: pd.value_counts(pd.cut(x*100,bins)))

          Percentile1  Percentile2  Percentile3  Percentile4 
(0, 10]             1          NaN          NaN            1
(10, 20]            3            2            2            1
(20, 30]            1            2            1          NaN
(30, 40]          NaN            1            1            1
(60, 70]          NaN          NaN            1            1
(70, 80]          NaN          NaN          NaN            1

如果您想要'0-10'等,這是另一種替代方法,而不是pd.cut提供的(20, 30]

In [52]:

output = ['0-10','10-20','20-30','30-40','40-50','50-60','60-70','70-80','80-90','90-100']
df2=(df*10).astype(int)
df2=df2.applymap(lambda x: output[x])
print df2
    Percentile1 Percentile2 Percentile3 Percentile4
395       10-20       20-30       20-30       10-20
424       20-30       20-30       10-20        0-10
511        0-10       10-20       10-20       30-40
540       10-20       10-20       30-40       60-70
570       10-20       30-40       60-70       70-80

[5 rows x 4 columns]

In [53]:
print df2.apply(lambda x: x.value_counts()) #or /x.count()
level_1  Percentile1  Percentile2  Percentile3  Percentile4
class                                                      
0-10               1          NaN          NaN            1
10-20              3            2            2            1
20-30              1            2            1          NaN
30-40            NaN            1            1            1
60-70            NaN          NaN            1            1
70-80            NaN          NaN          NaN            1

[6 rows x 4 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM