pandas行值到列标题

Question

I have a daraframe like this 我有这样的daraframe

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})

   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     c
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

I need to transform into this form 我需要转变成这种形式

   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1. 每个id的值变量中可以有任意数量的级别，范围从1到10.如果该ID不存在级别，则应该为0，否则为1。

I am using anaconda python 3.5, windows 10 我使用的是anaconda python 3.5，windows 10

Answer 1

If need output 1 and 0 only for presence of value : 如果需要输出1和0仅用于存在value ：

You can use get_dummies with Series created by set_index , but then is necessary groupby + GroupBy.max : 您可以将get_dummies与get_dummies创建的Series set_index ，但是必须使用groupby + GroupBy.max ：

df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
       .groupby(level=[0,1])
       .max()
       .reset_index()
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

Another solution with groupby , size and unstack , but then is necesary compare with gt and convert to int by astype . 使用groupby ， size和unstack另一个解决方案，但是然后需要与gt进行比较并通过astype转换为int 。 Last reset_index and rename_axis : 上次reset_index和rename_axis ：

df = df.groupby(['id1','id2', 'value'])
      .size()
      .unstack(fill_value=0)
      .gt(0)
      .astype(int)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

If need count value s: 如果需要计value s：

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
                   'id2':[1,1,1,1,2,2,2],
                   'value':['a','b','a','d','a','b','c']})

print (df)
   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     a
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

df = df.groupby(['id1','id2', 'value'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

Or: 要么：

df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

pandas行值到列标题

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-06-24 05:04:46

pandas行值到列标题

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-06-24 05:04:46

解决方案1
5 已采纳 2017-06-24 05:04:46