How can I split a column in pandas DataFrame by variable names in a column? I have a DataFrame below:
ID FEATURE PARAM VALUE
0 A101 U1 ITEM1 10
1 A101 U1 ITEM2 11
2 A101 U2 ITEM1 12
3 A101 U2 ITEM2 13
4 A102 U1 ITEM1 14
5 A102 U1 ITEM2 15
6 A102 U2 ITEM1 16
7 A102 U2 ITEM2 17
I want to split it as below.
ID FEATURE ITEM1 ITEM2
0 A101 U1 10 11
1 A101 U2 12 13
2 A102 U1 14 15
3 A102 U2 16 17
I tried to use one of the responses and it works great but partially.
Select_Data.groupby('PARAM')['VALUE'].apply(list).apply(pd.Series).T
PARAM ITEM1 ITEM2
0 10 11
1 12 13
2 14 15
3 16 17
But I lost my ID & FEATURE columns and I want to keep them in the table. I will greatly appreciate any suggestions.
Using groupby
you can
In [566]: df.groupby('c1')['c2'].apply(list).apply(pd.Series).T
Out[566]:
c1 A B C
0 1 2 3
1 4 5 6
You can also use pivot_table
with index ID,FEATURE
and then reset index ie
ndf = pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE']).reset_index()
Incase you want to aggregate duplicate values then you can use mean value
ndf = pd.pivot_table(df,columns='PARAM', values='VALUE',index=['ID','FEATURE'],aggfunc='mean').reset_index()
Output:
PARAM ID FEATURE ITEM1 ITEM2 0 A101 U1 10 11 1 A101 U2 12 13 2 A102 U1 14 15 3 A102 U2 16 17 In [528]:
You can use set_index
and unstack
:
df = df.set_index(['ID','FEATURE','PARAM'])['VALUE']
.unstack()
.reset_index()
.rename_axis(None, axis=1)
print (df)
ID FEATURE ITEM1 ITEM2
0 A101 U1 10 11
1 A101 U2 12 13
2 A102 U1 14 15
3 A102 U2 16 17
but if get:
ValueError: Index contains duplicate entries, cannot reshape
then use Bharath shetty's solution or groupby
and aggregate mean
, because duplicates in triples ID,FEATURE,PARAM
:
print (df)
ID FEATURE PARAM VALUE
0 A101 U2 ITEM1 50<-same A101,U2,ITEM1
1 A101 U1 ITEM2 11
2 A101 U2 ITEM1 12<-same A101,U2,ITEM1
3 A101 U2 ITEM2 13
4 A102 U1 ITEM1 14
5 A102 U1 ITEM2 15
6 A102 U2 ITEM1 16
7 A102 U2 ITEM2 17
df = df.groupby(['ID','FEATURE','PARAM'])['VALUE'].mean()
.unstack().reset_index().rename_axis(None, axis=1)
print (df)
ID FEATURE ITEM1 ITEM2
0 A101 U1 NaN 11.0
1 A101 U2 31.0 13.0<-(50+12)/2=31
2 A102 U1 14.0 15.0
3 A102 U2 16.0 17.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.