[英]Pandas - Pivot Multiple Categorical Columns
我有這樣一個數據框:
name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']
test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})
actfig car name pet
0 superman lamborghini fred cat
1 batman ferrari fred dog
2 flash bugatti fred bird
3 greenlantern ferrari james cat
4 flash corvette james dog
5 batman bugatti rick dog
6 joker bmw rick fish
7 superman bmw jeff marmet
如果我的術語不正確,請原諒我,但我想對數據進行透視,以便在每個名稱的['actionfigures','car','pet']列中獲取每個值的計數。
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
name
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
我本以為test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])
可以做到,但它給了我一些奇怪的級別列。
我想也許我可以在每一列上連接get_dummies
,然后按名稱和總和進行get_dummies
,但是覺得熊貓概率有更好的方法。
怎么做?
melt
並pivot
test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]:
value batman bird bmw bugatti cat corvette dog ferrari fish flash \
name
fred 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
james 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0
jeff 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
rick 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0
value greenlantern joker lamborghini marmet superman
name
fred 0.0 0.0 1.0 0.0 1.0
james 1.0 0.0 0.0 0.0 0.0
jeff 0.0 0.0 0.0 1.0 1.0
rick 0.0 1.0 0.0 0.0 0.0
或get_dummies
pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]:
actfig_batman actfig_flash actfig_greenlantern actfig_joker \
name
fred 1 1 0 0
james 0 1 1 0
jeff 0 0 0 0
rick 1 0 0 1
actfig_superman car_bmw car_bugatti car_corvette car_ferrari \
name
fred 1 0 1 0 1
james 0 0 0 1 1
jeff 1 1 0 0 0
rick 0 1 1 0 0
car_lamborghini pet_bird pet_cat pet_dog pet_fish pet_marmet
name
fred 1 1 1 1 0 0
james 0 0 1 1 0 0
jeff 0 0 0 0 0 1
rick 0 0 0 1 1 0
編輯:根據PiR
pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1])
選項1
pd.get_dummies
部分
a = pd.get_dummies(test.actfig)
c = pd.get_dummies(test.car)
p = pd.get_dummies(test.pet)
n = pd.get_dummies(test.name).T
pd.concat([n.dot(d) for d in [a, c, p]], axis=1)
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
選項2
stack
+ pd.crosstab
test.set_index('name').stack().pipe(
lambda x: pd.crosstab(x.index.get_level_values(0), x.values))
col_0 batman bird bmw bugatti cat corvette dog ferrari fish flash greenlantern joker lamborghini marmet superman
row_0
fred 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1
james 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0
jeff 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1
rick 1 0 1 1 0 0 1 0 1 0 0 1 0 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.