简体   繁体   English

大熊猫中的枢轴数据框?

[英]pivot data frame in pandas?

people1 trait1 YES
people1 trait2 YES
people1 trait3 NO
people1 trait4 RED
people2 trait1 NO
people2 trait2 YES
people2 trait4 BLACK

etc..等等..

It's possible to create from that table something like this?可以从那个表创建这样的东西吗?

        trait1, trait2, trait3, trait4 ...
people1  YES     YES     NO      RED
people2  NO      YES     -       BLACK
people3  -        -      YES     BLUE

The file is too big to do that in excel, I tried in pandas, but I can't find help in this case.该文件太大而无法在 excel 中执行此操作,我在 Pandas 中尝试过,但在这种情况下我找不到帮助。 I found pd.pivot_table funcion but I can't build working code.我找到了 pd.pivot_table 函数,但我无法构建工作代码。 I tried and got various erors (99% my fault).我尝试并得到了各种错误(99% 是我的错)。

Can someone explain me how to use it in my case?有人可以解释我如何在我的情况下使用它吗? Or maybe is better option than pandas.pivot?+或者也许是比 pandas.pivot 更好的选择?+

EDIT编辑

I rebuild my frame:
1      'interpretation'     'trait'
p1           YES               t1
p1           BLACK             t2
p1           NO                t3
p2           NO                t1
p2           RED               t2
p2           NO                t3

And I use suggestion:我使用建议:

data1.pivot_table(index=1, columns="name", values='trait', aggfunc=','.join, fill_value='-'). data1.pivot_table(index=1, columns="name", values='trait', aggfunc=','.join, fill_value='-').

And I got:我得到了:

TypeError: sequence item 0: expected str instance, float found

If I change如果我改变

data1.pivot_table(index=1, columns="trait", values='value', aggfunc=','.join, fill_value='-'). data1.pivot_table(index=1, columns="trait", values='value', aggfunc=','.join, fill_value='-').

I got bad order table but without error:我得到了错误的订单表,但没有错误:

     p1      p2    p3    p4
YES  trait1  t1
YES  t1      t2 etc.
NO
RED
No
...

So i think, the first option is correct, but I cant repair that error.所以我认为,第一个选项是正确的,但我无法修复该错误。 When I dtype df it return (O) for all cols.当我输入 df 时,它为所有列返回 (O)。

I think problem is missing values in column trait , so join function failed.我认为问题是列trait缺少值,所以join函数失败。 So possible solution is replace missing values to empty strings:所以可能的解决方案是将缺失值替换为空字符串:

print (data1)
    1   name trait
0  p1    YES   NaN <- missing value
1  p1  BLACK    t2
2  p1     NO    t3
3  p2     NO    t1
4  p2    RED    t2
5  p2     NO    t3

data1['trait'] = data1['trait'].fillna('')
df = data1.pivot_table(index=1, 
                       columns="name", 
                       values='trait', 
                       aggfunc=','.join, 
                       fill_value='-')
print (df)
1      p1     p2
name            
BLACK  t2      -
NO     t3  t1,t3
RED     -     t2
YES            -

Also if want convert index to column:另外,如果要将索引转换为列:

data1['trait'] = data1['trait'].fillna('')
df = (data1.pivot_table(index=1, 
                       columns="name", 
                       values='trait', 
                       aggfunc=','.join, 
                       fill_value='-')
           .reset_index()
           .rename_axis(None, axis=1))
print (df)
    name  p1     p2
0  BLACK  t2      -
1     NO  t3  t1,t3
2    RED   -     t2
3    YES          -

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM