简体   繁体   English

转换Pandas Dataframe

[英]Transforming Pandas Dataframe

Is there a pandas function to transform this data so it show the columns as a,b,c,d,e or whatever is inside the data field and the rows count how many of the letters there are. 是否有一个pandas函数来转换此数据,因此它将列显示为a,b,c,d,e或数据字段内的任何内容,并且行计算有多少字母。

import pandas as pd

trans = pd.read_table('output.txt', header=None, index_col=0)

print trans
>>> 
        1  2    3    4
0                     
11      a  b    c  NaN
666     a  d    e  NaN
10101   b  c    d  NaN
1010    a  b    c    d
414147  b  c  NaN  NaN
10101   a  b    d  NaN
1242    d  e  NaN  NaN
101     a  b    c    d
411     c  d    e  NaN
444     a  b    c  NaN

instead I want the output to be like this: 相反,我希望输出是这样的:

        a  b    c     d   e
0                     
11      1  1    1   NaN  NaN
666     1  NaN  NaN   1    1

The function .stack() almost gets it done but in the wrong format. 函数.stack()几乎完成它但格式错误。

You could also use Pandas get_dummies() 你也可以使用Pandas get_dummies()

pd.get_dummies(df.unstack().dropna()).groupby(level=1).sum()

results in: 结果是:

        a  b  c  d  e
0                    
11      1  1  1  0  0
666     1  0  0  1  1
10101   0  1  1  1  0
1010    1  1  1  1  0
414147  0  1  1  0  0
10101   1  1  0  1  0
1242    0  0  0  1  1
101     1  1  1  1  0
411     0  0  1  1  1
444     1  1  1  0  0

You could replace the zeros with NaN's in you want to. 你可以用你想要的NaN替换零。

Its a bit obscure in one line. 它在一行中有点模糊。 df.unstack().dropna() basically flattens your DataFrame to a series and drops al NaN's. df.unstack().dropna()基本上将你的DataFrame展平为一个系列并删除al NaN。 The get_dummies gives a table of all the occurrences of the letters, but for each level in the unstack DataFrame. get_dummies给出了所有字母出现的表格,但是对于unstack DataFrame中的每个级别都是如此。 The grouping and sum then combine the index to the original shape. 然后,分组和总和将索引与原始形状组合。

Something like this may be: 这样的事情可能是:

>>> st = pd.DataFrame(trans.stack()).reset_index(level=0)
>>> st.columns = ['i','c']
>>> st.pivot_table(rows='i', cols='c', aggfunc=len)
c        a   b   c   d   e
i                         
11       1   1   1 NaN NaN
101      1   1   1   1 NaN
411    NaN NaN   1   1   1
444      1   1   1 NaN NaN
666      1 NaN NaN   1   1
1010     1   1   1   1 NaN
1242   NaN NaN NaN   1   1
10101    1   2   1   2 NaN
414147 NaN   1   1 NaN NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM