简体   繁体   中英

Reduce number of columns in a pandas DataFrame

I'm trying to create a violin plot in seaborn . The input is a pandas DataFrame, and it looks like in order to separate the data along the x axis I need to differentiate on a single column. I currently have a DataFrame that has floating point values for several sensors:

>>>df.columns
Index('SensorA', 'SensorB', 'SensorC', 'SensorD', 'group_id')

That is, each Sensor[AZ] column contains a bunch of numbers:

>>>df['SensorA'].head()
0    0.072706
1    0.072698
2    0.072701
3    0.072303
4    0.071951
Name: SensorA, dtype: float64

And for this problem, I'm only interested in 2 groups:

>>>df['group_id'].unique()
'1', '2'

I want each Sensor to be a separate violin along the x axis.

I think this means I need to convert this into something of the form:

>>>df.columns
Index('Value', 'Sensor', 'group_id')

where the Sensor column in the new DataFrame contains the text "SensorA", "SensorB", etc., the Value column in the new DataFrame contains the values that were original in each Sensor[AZ] column, and the group information is preserved.

I could then create a violinplot using the following command:

ax = sns.violinplot(x="Sensor", y="Value", hue="group_id", data=df)

I'm thinking I kind of need to do a reverse pivot. Is there an easy way of doing this?

Use panda's melt function

import pandas as pd
import numpy as np
df = pd.DataFrame({'SensorA':[1,3,4,5,6], 'SensorB':[5,2,3,6,7], 'SensorC':[7,4,8,1,10], 'group_id':[1,2,1,1,2]})
df = pd.melt(df, id_vars = 'group_id', var_name = 'Sensor')
print df

gives

    group_id   Sensor  value
0          1  SensorA      1
1          2  SensorA      3
2          1  SensorA      4
3          1  SensorA      5
4          2  SensorA      6
5          1  SensorB      5
6          2  SensorB      2
7          1  SensorB      3
8          1  SensorB      6
9          2  SensorB      7
10         1  SensorC      7
11         2  SensorC      4
12         1  SensorC      8
13         1  SensorC      1
14         2  SensorC     10

May it's not the best way but it works (AFAIU):

import pandas as pd
import numpy as np
df = pd.DataFrame({'SensorA':[1,3,4,5,6], 'SensorB':[5,2,3,6,7], 'SensorC':[7,4,8,1,10], 'group_id':[1,2,1,1,2]})
groupedID = df.groupby('group_id')
df1 = pd.DataFrame()
for groupNum in groupedID.groups.keys():
  dfSensors = groupedID.get_group(groupNum).filter(regex='Sen').stack()
  _, sensorNames = zip(*dfSensors.index)
  df2 = pd.DataFrame({'Sensor': sensorNames, 'Value':dfSensors.values, 'group_id':groupNum})
  df1 = pd.concat([df1, df2])
print(df1)

Output:

    Sensor  Value  group_id
0  SensorA      1         1
1  SensorB      5         1
2  SensorC      7         1
3  SensorA      4         1
4  SensorB      3         1
5  SensorC      8         1
6  SensorA      5         1
7  SensorB      6         1
8  SensorC      1         1
0  SensorA      3         2
1  SensorB      2         2
2  SensorC      4         2
3  SensorA      6         2
4  SensorB      7         2
5  SensorC     10         2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM