简体   繁体   English

Python Pandas数据框中的拆分列

[英]Splitting column in Python Pandas dataframe

How can I split a column in pandas DataFrame by variable names in a column? 如何通过列中的变量名称拆分pandas DataFrame中的列? I have a DataFrame below: 我有一个DataFrame如下:

    ID  FEATURE PARAM   VALUE
0   A101    U1  ITEM1   10
1   A101    U1  ITEM2   11
2   A101    U2  ITEM1   12
3   A101    U2  ITEM2   13
4   A102    U1  ITEM1   14
5   A102    U1  ITEM2   15
6   A102    U2  ITEM1   16
7   A102    U2  ITEM2   17

I want to split it as below. 我想把它拆分如下。

    ID  FEATURE ITEM1   ITEM2
0   A101    U1  10  11
1   A101    U2  12  13
2   A102    U1  14  15
3   A102    U2  16  17

I tried to use one of the responses and it works great but partially. 我尝试使用其中一个响应,它工作得很好,但部分。

Select_Data.groupby('PARAM')['VALUE'].apply(list).apply(pd.Series).T

PARAM   ITEM1   ITEM2
0   10  11
1   12  13
2   14  15
3   16  17

But I lost my ID & FEATURE columns and I want to keep them in the table. 但是我丢失了我的ID和FEATURE列,我想把它们放在桌子上。 I will greatly appreciate any suggestions. 我将非常感谢任何建议。

Using groupby you can 你可以使用groupby

In [566]: df.groupby('c1')['c2'].apply(list).apply(pd.Series).T
Out[566]:
c1  A  B  C
0   1  2  3
1   4  5  6

You can also use pivot_table with index ID,FEATURE and then reset index ie 您还可以使用带索引ID,FEATURE pivot_table ID,FEATURE ,然后重置索引即

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE']).reset_index()

Incase you want to aggregate duplicate values then you can use mean value 如果您想要聚合重复值,那么您可以使用平均值

ndf =  pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE'],aggfunc='mean').reset_index()

Output: 输出:

PARAM    ID FEATURE  ITEM1  ITEM2
0      A101      U1     10     11
1      A101      U2     12     13
2      A102      U1     14     15
3      A102      U2     16     17
In [528]:

You can use set_index and unstack : 您可以使用set_indexunstack

df = df.set_index(['ID','FEATURE','PARAM'])['VALUE']
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1     10     11
1  A101      U2     12     13
2  A102      U1     14     15
3  A102      U2     16     17

but if get: 但如果得到:

ValueError: Index contains duplicate entries, cannot reshape ValueError:索引包含重复的条目,无法重新整形

then use Bharath shetty's solution or groupby and aggregate mean , because duplicates in triples ID,FEATURE,PARAM : 然后使用Bharath shetty的解决方案groupby和聚合mean ,因为三元组ID,FEATURE,PARAM重复:

print (df)
     ID FEATURE  PARAM  VALUE
0  A101      U2  ITEM1     50<-same A101,U2,ITEM1
1  A101      U1  ITEM2     11
2  A101      U2  ITEM1     12<-same A101,U2,ITEM1
3  A101      U2  ITEM2     13
4  A102      U1  ITEM1     14
5  A102      U1  ITEM2     15
6  A102      U2  ITEM1     16
7  A102      U2  ITEM2     17


df = df.groupby(['ID','FEATURE','PARAM'])['VALUE'].mean()
       .unstack().reset_index().rename_axis(None, axis=1)
print (df)
     ID FEATURE  ITEM1  ITEM2
0  A101      U1    NaN   11.0
1  A101      U2   31.0   13.0<-(50+12)/2=31
2  A102      U1   14.0   15.0
3  A102      U2   16.0   17.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM