[英]Pandas - Pivot Multiple Categorical Columns
I have a dataframe as such: 我有这样一个数据框:
name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']
test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})
actfig car name pet
0 superman lamborghini fred cat
1 batman ferrari fred dog
2 flash bugatti fred bird
3 greenlantern ferrari james cat
4 flash corvette james dog
5 batman bugatti rick dog
6 joker bmw rick fish
7 superman bmw jeff marmet
Forgive me if my terminology is incorrect, but I want to pivot the data so that I get counts for each value in the ['actionfigures','car','pet'] columns for each name. 如果我的术语不正确,请原谅我,但我想对数据进行透视,以便在每个名称的['actionfigures','car','pet']列中获取每个值的计数。
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
name
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
I would have thought that test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])
would do it, but it gives me some weird multi-level columns. 我本以为test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])
可以做到,但它给了我一些奇怪的级别列。
Thinking maybe I could concat get_dummies
for each column then groupby name and sum, but feel like pandas prob has a better way. 我想也许我可以在每一列上连接get_dummies
,然后按名称和总和进行get_dummies
,但是觉得熊猫概率有更好的方法。
How would this be done? 怎么做?
melt
and pivot
melt
并pivot
test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]:
value batman bird bmw bugatti cat corvette dog ferrari fish flash \
name
fred 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
james 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0
jeff 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
rick 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0
value greenlantern joker lamborghini marmet superman
name
fred 0.0 0.0 1.0 0.0 1.0
james 1.0 0.0 0.0 0.0 0.0
jeff 0.0 0.0 0.0 1.0 1.0
rick 0.0 1.0 0.0 0.0 0.0
Or get_dummies
或get_dummies
pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]:
actfig_batman actfig_flash actfig_greenlantern actfig_joker \
name
fred 1 1 0 0
james 0 1 1 0
jeff 0 0 0 0
rick 1 0 0 1
actfig_superman car_bmw car_bugatti car_corvette car_ferrari \
name
fred 1 0 1 0 1
james 0 0 0 1 1
jeff 1 1 0 0 0
rick 0 1 1 0 0
car_lamborghini pet_bird pet_cat pet_dog pet_fish pet_marmet
name
fred 1 1 1 1 0 0
james 0 0 1 1 0 0
jeff 0 0 0 0 0 1
rick 0 0 0 1 1 0
Edit: As per PiR 编辑:根据PiR
pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1])
Option 1 选项1
pd.get_dummies
by parts pd.get_dummies
部分
a = pd.get_dummies(test.actfig)
c = pd.get_dummies(test.car)
p = pd.get_dummies(test.pet)
n = pd.get_dummies(test.name).T
pd.concat([n.dot(d) for d in [a, c, p]], axis=1)
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
Option 2 选项2
stack
+ pd.crosstab
stack
+ pd.crosstab
test.set_index('name').stack().pipe(
lambda x: pd.crosstab(x.index.get_level_values(0), x.values))
col_0 batman bird bmw bugatti cat corvette dog ferrari fish flash greenlantern joker lamborghini marmet superman
row_0
fred 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1
james 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0
jeff 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1
rick 1 0 1 1 0 0 1 0 1 0 0 1 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.