簡體   English   中英

熊貓-樞軸多個分類列

[英]Pandas - Pivot Multiple Categorical Columns

我有這樣一個數據框:

name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']

test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})

    actfig       car                name    pet
0   superman     lamborghini        fred    cat
1   batman       ferrari            fred    dog
2   flash        bugatti            fred    bird
3   greenlantern ferrari            james   cat
4   flash        corvette           james   dog
5   batman       bugatti            rick    dog
6   joker        bmw                rick    fish
7   superman     bmw                jeff    marmet

如果我的術語不正確,請原諒我,但我想對數據進行透視,以便在每個名稱的['actionfigures','car','pet']列中獲取每個值的計數。

    batman  flash   greenlantern    joker   superman    bmw bugatti corvette    ferrari lamborghini bird    cat dog fish    marmet
name                                                            
fred    1   1   0   0   1   0   1   0   1   1   1   1   1   0   0
james   0   1   1   0   0   0   0   1   1   0   0   1   1   0   0
jeff    0   0   0   0   1   1   0   0   0   0   0   0   0   0   1
rick    1   0   0   1   0   1   1   0   0   0   0   0   1   1   0

我本以為test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])可以做到,但它給了我一些奇怪的級別列。

我想也許我可以在每一列上連接get_dummies ,然后按名稱和總和進行get_dummies ,但是覺得熊貓概率有更好的方法。

怎么做?

meltpivot

test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]: 
value  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  \
name                                                                          
fred      1.0   1.0  0.0      1.0  1.0       0.0  1.0      1.0   0.0    1.0   
james     0.0   0.0  0.0      0.0  1.0       1.0  1.0      1.0   0.0    1.0   
jeff      0.0   0.0  1.0      0.0  0.0       0.0  0.0      0.0   0.0    0.0   
rick      1.0   0.0  1.0      1.0  0.0       0.0  1.0      0.0   1.0    0.0   
value  greenlantern  joker  lamborghini  marmet  superman  
name                                                       
fred            0.0    0.0          1.0     0.0       1.0  
james           1.0    0.0          0.0     0.0       0.0  
jeff            0.0    0.0          0.0     1.0       1.0  
rick            0.0    1.0          0.0     0.0       0.0  

get_dummies

pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]: 
       actfig_batman  actfig_flash  actfig_greenlantern  actfig_joker  \
name                                                                    
fred               1             1                    0             0   
james              0             1                    1             0   
jeff               0             0                    0             0   
rick               1             0                    0             1   
       actfig_superman  car_bmw  car_bugatti  car_corvette  car_ferrari  \
name                                                                      
fred                 1        0            1             0            1   
james                0        0            0             1            1   
jeff                 1        1            0             0            0   
rick                 0        1            1             0            0   
       car_lamborghini  pet_bird  pet_cat  pet_dog  pet_fish  pet_marmet  
name                                                                      
fred                 1         1        1        1         0           0  
james                0         0        1        1         0           0  
jeff                 0         0        0        0         0           1  
rick                 0         0        0        1         1           0

編輯:根據PiR

pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1]) 

選項1
pd.get_dummies部分

a = pd.get_dummies(test.actfig)
c = pd.get_dummies(test.car)
p = pd.get_dummies(test.pet)
n = pd.get_dummies(test.name).T

pd.concat([n.dot(d) for d in [a, c, p]], axis=1)

       batman  flash  greenlantern  joker  superman  bmw  bugatti  corvette  ferrari  lamborghini  bird  cat  dog  fish  marmet
fred        1      1             0      0         1    0        1         0        1            1     1    1    1     0       0
james       0      1             1      0         0    0        0         1        1            0     0    1    1     0       0
jeff        0      0             0      0         1    1        0         0        0            0     0    0    0     0       1
rick        1      0             0      1         0    1        1         0        0            0     0    0    1     1       0

選項2
stack + pd.crosstab

test.set_index('name').stack().pipe(
    lambda x: pd.crosstab(x.index.get_level_values(0), x.values))

col_0  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  greenlantern  joker  lamborghini  marmet  superman
row_0                                                                                                                          
fred        1     1    0        1    1         0    1        1     0      1             0      0            1       0         1
james       0     0    0        0    1         1    1        1     0      1             1      0            0       0         0
jeff        0     0    1        0    0         0    0        0     0      0             0      0            0       1         1
rick        1     0    1        1    0         0    1        0     1      0             0      1            0       0         0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM