简体   繁体   中英

Reducing rows in a column for a panda DataFrame for plotting

So I have a csv table of data which I have read into a panda DataFrame, however one of the columns has the same string in multiple rows, which is correct as its a classification data, but when I plot this column against another of values, it treats each cell in this column as separate rather than combining them.

Classification        Value
MIR-weak:             0.0896571179
MIR-weak:             0.1990277968
MIR-bright:           0.2850534357
MIR-bright:           0.0807078051
FIR-dark/MIR-bright:  1.7610864745
MIR-weak:             0.0826692503
MIR-weak:             0.349403222
MIR-weak:             0.7326764485
MIR-weak:             0.0179843643
MIR-weak:             0.0761941975
MIR-bright:           0.4298597194
MIR-weak:             0.4143098599
MIR-weak:             0.1439220025
MIR-weak:             0.0810787048
MIR-bright:           0.6369812293
MIR-weak:             0.0973845298
MIR-weak:             0.1871236732
MIR-weak:             1.5795256821
MIR-weak:             0.9072559132
MIR-weak:             0.6218977498
FIR-dark/MIR-bright:  0.6920326523
MIR-weak:             0.2580561867
MIR-bright:           0.055071288
MIR-weak:             1.0512992066

So when I plot these columns against each other using DataFrame.plot(), the x-axis has every cell in the first column as an x value rather than just four x values, one for each classification

Any way to sort this, either with .plot() or doing something with the data?

I presume you want a stacked bar plot, so starting with your dataframe looking like this

Classification     Value
0              MIR-weak  0.089657
1              MIR-weak  0.199028
2            MIR-bright  0.285053
3            MIR-bright  0.080708
4   FIR-dark/MIR-bright  1.761086
5              MIR-weak  0.082669
6              MIR-weak  0.349403
7              MIR-weak  0.732676
8              MIR-weak  0.017984
9              MIR-weak  0.076194
10           MIR-bright  0.429860
11             MIR-weak  0.414310
12             MIR-weak  0.143922
13             MIR-weak  0.081079
14           MIR-bright  0.636981
15             MIR-weak  0.097385
16             MIR-weak  0.187124
17             MIR-weak  1.579526
18             MIR-weak  0.907256
19             MIR-weak  0.621898
20  FIR-dark/MIR-bright  0.692033
21             MIR-weak  0.258056
22           MIR-bright  0.055071
23             MIR-weak  1.051299

you can do these steps:

  • Sort by Classification.

  • Pivot around Classification.

  • Change columns to get rid of the multi-index.

  • Do a stacked bar plot of the transposed dataframe.

.

D = D.sort_values("Classification").reset_index(drop=True)
D = D.pivot(columns='Classification')
D.columns = ["FIR-dark/MIR-bright", "MIR-bright", "MIR-weak"]
D.T.plot.bar(stacked=True,legend=False)

The result looks pretty ugly though, so you need to tweak the appearance.

Not sure if that's the correct thing since it only has three categories, but your original also has only three.

You need to tell pandas that the 'Classification' column contains categorical data, to do so, use astype

I use read_clipboard to read the data in the OP

import pandas as pd

df = pd.read_clipboard()

df['Classification']=df['Classification'].str.strip(':').astype(
'category',categories=['MIR-weak', 
                       'MIR-bright',
                       'FIR-dark/MIR-bright'], ordered=True)

df.plot(x='Classification',y='Value')

The graph will look like 在此处输入图片说明

you can also use groupby with mean/sum/size or any other measure to view the data, here is an example grouping data by the classification column and calculating the mean for each group then plotting the result

df.groupby('Classification').mean().plot(kind='bar')

The result will look like 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM