Python Pandas数据框中的拆分列

Question

How can I split a column in pandas DataFrame by variable names in a column? 如何通过列中的变量名称拆分pandas DataFrame中的列？ I have a DataFrame below: 我有一个DataFrame如下：

    ID  FEATURE PARAM   VALUE
0   A101    U1  ITEM1   10
1   A101    U1  ITEM2   11
2   A101    U2  ITEM1   12
3   A101    U2  ITEM2   13
4   A102    U1  ITEM1   14
5   A102    U1  ITEM2   15
6   A102    U2  ITEM1   16
7   A102    U2  ITEM2   17

I want to split it as below. 我想把它拆分如下。

    ID  FEATURE ITEM1   ITEM2
0   A101    U1  10  11
1   A101    U2  12  13
2   A102    U1  14  15
3   A102    U2  16  17

I tried to use one of the responses and it works great but partially. 我尝试使用其中一个响应，它工作得很好，但部分。

Select_Data.groupby('PARAM')['VALUE'].apply(list).apply(pd.Series).T

PARAM   ITEM1   ITEM2
0   10  11
1   12  13
2   14  15
3   16  17

But I lost my ID & FEATURE columns and I want to keep them in the table. 但是我丢失了我的ID和FEATURE列，我想把它们放在桌子上。 I will greatly appreciate any suggestions. 我将非常感谢任何建议。

Answer 1

Using groupby you can 你可以使用groupby

In [566]: df.groupby('c1')['c2'].apply(list).apply(pd.Series).T
Out[566]:
c1  A  B  C
0   1  2  3
1   4  5  6

Answer 2

You can also use pivot_table with index ID,FEATURE and then reset index ie 您还可以使用带索引ID,FEATURE pivot_table ID,FEATURE ，然后重置索引即

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE']).reset_index()

Incase you want to aggregate duplicate values then you can use mean value 如果您想要聚合重复值，那么您可以使用平均值

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE'],aggfunc='mean').reset_index()

Output: 输出：

PARAM    ID FEATURE  ITEM1  ITEM2
0      A101      U1     10     11
1      A101      U2     12     13
2      A102      U1     14     15
3      A102      U2     16     17
In [528]:

Answer 3

You can use set_index and unstack : 您可以使用set_index和unstack ：

df = df.set_index(['ID','FEATURE','PARAM'])['VALUE']
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1     10     11
1  A101      U2     12     13
2  A102      U1     14     15
3  A102      U2     16     17

but if get: 但如果得到：

ValueError: Index contains duplicate entries, cannot reshape ValueError：索引包含重复的条目，无法重新整形

then use Bharath shetty's solution or groupby and aggregate mean , because duplicates in triples ID,FEATURE,PARAM : 然后使用Bharath shetty的解决方案或groupby和聚合mean ，因为三元组ID,FEATURE,PARAM重复：

print (df)
     ID FEATURE  PARAM  VALUE
0  A101      U2  ITEM1     50<-same A101,U2,ITEM1
1  A101      U1  ITEM2     11
2  A101      U2  ITEM1     12<-same A101,U2,ITEM1
3  A101      U2  ITEM2     13
4  A102      U1  ITEM1     14
5  A102      U1  ITEM2     15
6  A102      U2  ITEM1     16
7  A102      U2  ITEM2     17


df = df.groupby(['ID','FEATURE','PARAM'])['VALUE'].mean()
       .unstack().reset_index().rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1    NaN   11.0
1  A101      U2   31.0   13.0<-(50+12)/2=31
2  A102      U1   14.0   15.0
3  A102      U2   16.0   17.0

Python Pandas数据框中的拆分列

问题描述

3 个解决方案

解决方案1
1 2017-08-06 07:38:42

解决方案2
1 已采纳 2017-08-06 08:53:06

解决方案3
1 2017-08-07 10:19:57

Python Pandas数据框中的拆分列

问题描述

3 个解决方案

解决方案1 1 2017-08-06 07:38:42

解决方案2 1 已采纳 2017-08-06 08:53:06

解决方案3 1 2017-08-07 10:19:57

解决方案1
1 2017-08-06 07:38:42

解决方案2
1 已采纳 2017-08-06 08:53:06

解决方案3
1 2017-08-07 10:19:57