简体   繁体   English

熊猫-枢轴多个分类列

[英]Pandas - Pivot Multiple Categorical Columns

I have a dataframe as such: 我有这样一个数据框:

name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']

test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})

    actfig       car                name    pet
0   superman     lamborghini        fred    cat
1   batman       ferrari            fred    dog
2   flash        bugatti            fred    bird
3   greenlantern ferrari            james   cat
4   flash        corvette           james   dog
5   batman       bugatti            rick    dog
6   joker        bmw                rick    fish
7   superman     bmw                jeff    marmet

Forgive me if my terminology is incorrect, but I want to pivot the data so that I get counts for each value in the ['actionfigures','car','pet'] columns for each name. 如果我的术语不正确,请原谅我,但我想对数据进行透视,以便在每个名称的['actionfigures','car','pet']列中获取每个值的计数。

    batman  flash   greenlantern    joker   superman    bmw bugatti corvette    ferrari lamborghini bird    cat dog fish    marmet
name                                                            
fred    1   1   0   0   1   0   1   0   1   1   1   1   1   0   0
james   0   1   1   0   0   0   0   1   1   0   0   1   1   0   0
jeff    0   0   0   0   1   1   0   0   0   0   0   0   0   0   1
rick    1   0   0   1   0   1   1   0   0   0   0   0   1   1   0

I would have thought that test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size']) would do it, but it gives me some weird multi-level columns. 我本以为test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])可以做到,但它给了我一些奇怪的级别列。

Thinking maybe I could concat get_dummies for each column then groupby name and sum, but feel like pandas prob has a better way. 我想也许我可以在每一列上连接get_dummies ,然后按名称和总和进行get_dummies ,但是觉得熊猫概率有更好的方法。

How would this be done? 怎么做?

melt and pivot meltpivot

test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]: 
value  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  \
name                                                                          
fred      1.0   1.0  0.0      1.0  1.0       0.0  1.0      1.0   0.0    1.0   
james     0.0   0.0  0.0      0.0  1.0       1.0  1.0      1.0   0.0    1.0   
jeff      0.0   0.0  1.0      0.0  0.0       0.0  0.0      0.0   0.0    0.0   
rick      1.0   0.0  1.0      1.0  0.0       0.0  1.0      0.0   1.0    0.0   
value  greenlantern  joker  lamborghini  marmet  superman  
name                                                       
fred            0.0    0.0          1.0     0.0       1.0  
james           1.0    0.0          0.0     0.0       0.0  
jeff            0.0    0.0          0.0     1.0       1.0  
rick            0.0    1.0          0.0     0.0       0.0  

Or get_dummies get_dummies

pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]: 
       actfig_batman  actfig_flash  actfig_greenlantern  actfig_joker  \
name                                                                    
fred               1             1                    0             0   
james              0             1                    1             0   
jeff               0             0                    0             0   
rick               1             0                    0             1   
       actfig_superman  car_bmw  car_bugatti  car_corvette  car_ferrari  \
name                                                                      
fred                 1        0            1             0            1   
james                0        0            0             1            1   
jeff                 1        1            0             0            0   
rick                 0        1            1             0            0   
       car_lamborghini  pet_bird  pet_cat  pet_dog  pet_fish  pet_marmet  
name                                                                      
fred                 1         1        1        1         0           0  
james                0         0        1        1         0           0  
jeff                 0         0        0        0         0           1  
rick                 0         0        0        1         1           0

Edit: As per PiR 编辑:根据PiR

pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1]) 

Option 1 选项1
pd.get_dummies by parts pd.get_dummies部分

a = pd.get_dummies(test.actfig)
c = pd.get_dummies(test.car)
p = pd.get_dummies(test.pet)
n = pd.get_dummies(test.name).T

pd.concat([n.dot(d) for d in [a, c, p]], axis=1)

       batman  flash  greenlantern  joker  superman  bmw  bugatti  corvette  ferrari  lamborghini  bird  cat  dog  fish  marmet
fred        1      1             0      0         1    0        1         0        1            1     1    1    1     0       0
james       0      1             1      0         0    0        0         1        1            0     0    1    1     0       0
jeff        0      0             0      0         1    1        0         0        0            0     0    0    0     0       1
rick        1      0             0      1         0    1        1         0        0            0     0    0    1     1       0

Option 2 选项2
stack + pd.crosstab stack + pd.crosstab

test.set_index('name').stack().pipe(
    lambda x: pd.crosstab(x.index.get_level_values(0), x.values))

col_0  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  greenlantern  joker  lamborghini  marmet  superman
row_0                                                                                                                          
fred        1     1    0        1    1         0    1        1     0      1             0      0            1       0         1
james       0     0    0        0    1         1    1        1     0      1             1      0            0       0         0
jeff        0     0    1        0    0         0    0        0     0      0             0      0            0       1         1
rick        1     0    1        1    0         0    1        0     1      0             0      1            0       0         0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM