简体   繁体   English

将DataFrame分类值重新整形为行

[英]Reshape DataFrame categorical values to rows

I'm having a hard time reorganizing this dataframe. 我很难重新组织这个数据帧。 I think I'm supposed to use pd.pivot_table or pd.crosstab , but I'm not sure how to get the job done. 我想我应该使用pd.pivot_tablepd.crosstab ,但我不知道如何完成工作。

Here is my DataFrame: 这是我的DataFrame:

vicro = pd.read_csv(vicroURL)
vicro_subset = vicro.ix[:,['P1', 'P10', 'P30', 'P71', 'P82', 'P90']]

In [6]: vicro
vicro         vicroURL      vicro_subset  

In [6]: vicro_subset.head()
Out[6]: 
  P1 P10 P30 P71 P82 P90
0  -   I   -   -   -   M
1  -   I   -   V   T   M
2  -   I   -   V   A   M
3  -   I   -   T   -   M
4  -   -   -   -   A   -

What I what to do is take all possible values in this data frame and make them into rows. 我该做什么是获取此数据框中的所有可能值并将它们分成行。 The new values will be counts. 新值将是计数。 Something that would look like: 看起来像:

Out[6]: 
  P1 P10 P30 P71 P82 P90
I  0   4   0   0   0   0
V  0   0   0   2   0   0
A  0   0   0   0   2   0
M  0   0   0   0   0   4
T  0   0   0   1   1   0

Any help would be greatly appreciated! 任何帮助将不胜感激! Thank you. 谢谢。

Edit: Elaborating on answer using melt, both helped me understand pandas a bit more, but there was more unknowns for me in "melt" answer: 编辑:用熔化来阐述答案,这两个都帮助我更多地理解了熊猫,但在“融化”答案中我有更多的未知数:

In [8]: melted_df = pd.melt(vicro_subset)

In [9]: melted_df.head()
Out[9]: 
  variable value
0       P1     -
1       P1     -
2       P1     -
3       P1     -
4       P1     -


In [13]: grouped_melt = melted_df.groupby(['variable','value'])['value'].count()
In [14]: grouped_melt.head()
Out[14]: 
variable  value
P1        -        797
          .        269
P10       -        339
          .          1
          F        132


In [15]: unstacked_group = grouped_melt.unstack()

In [16]: unstacked_group.head()
Out[16]: 
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, P1 to P82
Data columns:
-       5  non-null values
.       2  non-null values
A       1  non-null values
AITV    1  non-null values
AT      2  non-null values

In [17]: transpose_unstack = unstacked_group.T

In [18]: transpose_unstack.head()
Out[18]: 
variable   P1  P10   P30  P71  P82  P90
value                                  
-         797  339  1005  452  604  634
.         269    1   NaN  NaN  NaN  NaN
A         NaN  NaN   NaN  NaN  282  NaN
AITV      NaN  NaN   NaN  NaN    1  NaN
AT        NaN  NaN   NaN    1    2  NaN

Alternatively, something like this should work: 或者,像这样的东西应该工作:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.random.randint(0,5,12).reshape(3,4), 
                                            columns=list('abcd'))

In [4]: print df
   a  b  c  d
0  2  2  3  1
1  0  1  0  2
2  1  3  0  4

In [5]: new = pd.concat([df[col].value_counts() for col in df.columns], axis=1)

In [6]: new.columns = df.columns

In [7]: print new
    a   b   c   d
0   1 NaN   2 NaN
1   1   1 NaN   1
2   1   1 NaN   1
3 NaN   1   1 NaN
4 NaN NaN NaN   1

I figured the key is to use melt , and a bit of acrobatics afterwards. 我认为关键是使用melt ,然后是一些杂技。 So here's your DataFrame: 这是你的DataFrame:

In [21]: df
Out[21]:
  P1 P10 P30 P71 P82 P90
0  -   I   -   -   -   M
1  -   I   -   V   T   M
2  -   I   -   V   A   M
3  -   I   -   T   -   M
4  -   -   -   -   A   -

Now if you do the following (you might want to step it through in IPython to see the intermediate results): 现在,如果您执行以下操作(您可能希望在IPython中执行以查看中间结果):

In [22]: pd.melt(df).groupby(['variable', 'value'])['value'].count().unstack().T
.fillna(0)
Out[22]:
variable  P1  P10  P30  P71  P82  P90
value
-          5    1    5    2    2    1
A          0    0    0    0    2    0
I          0    4    0    0    0    0
M          0    0    0    0    0    4
T          0    0    0    1    1    0
V          0    0    0    2    0    0

Say you save the result in df2 , you can then remove the '-' row: 假设您将结果保存在df2 ,然后可以删除' - '行:

In [25]: df2.drop('-')
Out[25]:
variable  P1  P10  P30  P71  P82  P90
value
A          0    0    0    0    2    0
I          0    4    0    0    0    0
M          0    0    0    0    0    4
T          0    0    0    1    1    0
V          0    0    0    2    0    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM