简体   繁体   English

pandas行值到列标题

[英]pandas row values to column headers

I have a daraframe like this 我有这样的daraframe

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})

   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     c
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

I need to transform into this form 我需要转变成这种形式

   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1. 每个id的值变量中可以有任意数量的级别,范围从1到10.如果该ID不存在级别,则应该为0,否则为1。

I am using anaconda python 3.5, windows 10 我使用的是anaconda python 3.5,windows 10

If need output 1 and 0 only for presence of value : 如果需要输出10仅用于存在value

You can use get_dummies with Series created by set_index , but then is necessary groupby + GroupBy.max : 您可以将get_dummiesget_dummies创建的Series set_index ,但是必须使用groupby + GroupBy.max

df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
       .groupby(level=[0,1])
       .max()
       .reset_index()
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

Another solution with groupby , size and unstack , but then is necesary compare with gt and convert to int by astype . 使用groupbysizeunstack另一个解决方案,但是然后需要与gt进行比较并通过astype转换为int Last reset_index and rename_axis : 上次reset_indexrename_axis

df = df.groupby(['id1','id2', 'value'])
      .size()
      .unstack(fill_value=0)
      .gt(0)
      .astype(int)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

If need count value s: 如果需要计value s:

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
                   'id2':[1,1,1,1,2,2,2],
                   'value':['a','b','a','d','a','b','c']})

print (df)
   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     a
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

df = df.groupby(['id1','id2', 'value'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

Or: 要么:

df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM