So I have a csv table of data which I have read into a panda DataFrame, however one of the columns has the same string in multiple rows, which is correct as its a classification data, but when I plot this column against another of values, it treats each cell in this column as separate rather than combining them.
Classification Value
MIR-weak: 0.0896571179
MIR-weak: 0.1990277968
MIR-bright: 0.2850534357
MIR-bright: 0.0807078051
FIR-dark/MIR-bright: 1.7610864745
MIR-weak: 0.0826692503
MIR-weak: 0.349403222
MIR-weak: 0.7326764485
MIR-weak: 0.0179843643
MIR-weak: 0.0761941975
MIR-bright: 0.4298597194
MIR-weak: 0.4143098599
MIR-weak: 0.1439220025
MIR-weak: 0.0810787048
MIR-bright: 0.6369812293
MIR-weak: 0.0973845298
MIR-weak: 0.1871236732
MIR-weak: 1.5795256821
MIR-weak: 0.9072559132
MIR-weak: 0.6218977498
FIR-dark/MIR-bright: 0.6920326523
MIR-weak: 0.2580561867
MIR-bright: 0.055071288
MIR-weak: 1.0512992066
So when I plot these columns against each other using DataFrame.plot(), the x-axis has every cell in the first column as an x value rather than just four x values, one for each classification
Any way to sort this, either with .plot() or doing something with the data?
I presume you want a stacked bar plot, so starting with your dataframe looking like this
Classification Value
0 MIR-weak 0.089657
1 MIR-weak 0.199028
2 MIR-bright 0.285053
3 MIR-bright 0.080708
4 FIR-dark/MIR-bright 1.761086
5 MIR-weak 0.082669
6 MIR-weak 0.349403
7 MIR-weak 0.732676
8 MIR-weak 0.017984
9 MIR-weak 0.076194
10 MIR-bright 0.429860
11 MIR-weak 0.414310
12 MIR-weak 0.143922
13 MIR-weak 0.081079
14 MIR-bright 0.636981
15 MIR-weak 0.097385
16 MIR-weak 0.187124
17 MIR-weak 1.579526
18 MIR-weak 0.907256
19 MIR-weak 0.621898
20 FIR-dark/MIR-bright 0.692033
21 MIR-weak 0.258056
22 MIR-bright 0.055071
23 MIR-weak 1.051299
you can do these steps:
Sort by Classification.
Pivot around Classification.
Change columns to get rid of the multi-index.
Do a stacked bar plot of the transposed dataframe.
.
D = D.sort_values("Classification").reset_index(drop=True)
D = D.pivot(columns='Classification')
D.columns = ["FIR-dark/MIR-bright", "MIR-bright", "MIR-weak"]
D.T.plot.bar(stacked=True,legend=False)
The result looks pretty ugly though, so you need to tweak the appearance.
Not sure if that's the correct thing since it only has three categories, but your original also has only three.
You need to tell pandas that the 'Classification' column contains categorical data, to do so, use astype
I use read_clipboard
to read the data in the OP
import pandas as pd
df = pd.read_clipboard()
df['Classification']=df['Classification'].str.strip(':').astype(
'category',categories=['MIR-weak',
'MIR-bright',
'FIR-dark/MIR-bright'], ordered=True)
df.plot(x='Classification',y='Value')
you can also use groupby
with mean/sum/size or any other measure to view the data, here is an example grouping data by the classification column and calculating the mean for each group then plotting the result
df.groupby('Classification').mean().plot(kind='bar')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.