[英]Reducing rows in a column for a panda DataFrame for plotting
So I have a csv table of data which I have read into a panda DataFrame, however one of the columns has the same string in multiple rows, which is correct as its a classification data, but when I plot this column against another of values, it treats each cell in this column as separate rather than combining them. 因此,我有一个csv数据表,已将其读入panda DataFrame中,但是其中一列在多行中具有相同的字符串,这与它的分类数据是正确的,但是当我将该列与另一个值相对应时,它将此列中的每个单元格视为单独的而不是将它们组合在一起。
Classification Value
MIR-weak: 0.0896571179
MIR-weak: 0.1990277968
MIR-bright: 0.2850534357
MIR-bright: 0.0807078051
FIR-dark/MIR-bright: 1.7610864745
MIR-weak: 0.0826692503
MIR-weak: 0.349403222
MIR-weak: 0.7326764485
MIR-weak: 0.0179843643
MIR-weak: 0.0761941975
MIR-bright: 0.4298597194
MIR-weak: 0.4143098599
MIR-weak: 0.1439220025
MIR-weak: 0.0810787048
MIR-bright: 0.6369812293
MIR-weak: 0.0973845298
MIR-weak: 0.1871236732
MIR-weak: 1.5795256821
MIR-weak: 0.9072559132
MIR-weak: 0.6218977498
FIR-dark/MIR-bright: 0.6920326523
MIR-weak: 0.2580561867
MIR-bright: 0.055071288
MIR-weak: 1.0512992066
So when I plot these columns against each other using DataFrame.plot(), the x-axis has every cell in the first column as an x value rather than just four x values, one for each classification 因此,当我使用DataFrame.plot()绘制这些列彼此相对时,x轴将第一列中的每个单元格都作为x值,而不是四个x值,每个分类都一个
Any way to sort this, either with .plot() or doing something with the data? 用.plot()或对数据进行某种排序的方式吗?
I presume you want a stacked bar plot, so starting with your dataframe looking like this 我假设您想要一个堆积的条形图,所以从您的数据框开始像这样
Classification Value
0 MIR-weak 0.089657
1 MIR-weak 0.199028
2 MIR-bright 0.285053
3 MIR-bright 0.080708
4 FIR-dark/MIR-bright 1.761086
5 MIR-weak 0.082669
6 MIR-weak 0.349403
7 MIR-weak 0.732676
8 MIR-weak 0.017984
9 MIR-weak 0.076194
10 MIR-bright 0.429860
11 MIR-weak 0.414310
12 MIR-weak 0.143922
13 MIR-weak 0.081079
14 MIR-bright 0.636981
15 MIR-weak 0.097385
16 MIR-weak 0.187124
17 MIR-weak 1.579526
18 MIR-weak 0.907256
19 MIR-weak 0.621898
20 FIR-dark/MIR-bright 0.692033
21 MIR-weak 0.258056
22 MIR-bright 0.055071
23 MIR-weak 1.051299
you can do these steps: 您可以执行以下步骤:
Sort by Classification. 按分类排序。
Pivot around Classification. 围绕分类旋转。
Change columns to get rid of the multi-index. 更改列以摆脱多索引。
Do a stacked bar plot of the transposed dataframe. 对转置后的数据框进行堆叠的条形图。
. 。
D = D.sort_values("Classification").reset_index(drop=True)
D = D.pivot(columns='Classification')
D.columns = ["FIR-dark/MIR-bright", "MIR-bright", "MIR-weak"]
D.T.plot.bar(stacked=True,legend=False)
The result looks pretty ugly though, so you need to tweak the appearance. 结果看起来很丑陋,因此您需要调整外观。
Not sure if that's the correct thing since it only has three categories, but your original also has only three. 不确定这是否正确,因为它只有三个类别,但是您的原件也只有三个类别。
You need to tell pandas that the 'Classification' column contains categorical data, to do so, use astype
您需要告诉熊猫“分类”列包含分类数据,为此,请使用
astype
I use read_clipboard
to read the data in the OP 我使用
read_clipboard
读取OP中的数据
import pandas as pd
df = pd.read_clipboard()
df['Classification']=df['Classification'].str.strip(':').astype(
'category',categories=['MIR-weak',
'MIR-bright',
'FIR-dark/MIR-bright'], ordered=True)
df.plot(x='Classification',y='Value')
The graph will look like 该图看起来像
you can also use groupby
with mean/sum/size or any other measure to view the data, here is an example grouping data by the classification column and calculating the mean for each group then plotting the result 您还可以将
groupby
与均值/总和/大小或任何其他度量一起使用,以查看数据,这是按分类列对数据进行分组并计算每个组的均值然后绘制结果的示例
df.groupby('Classification').mean().plot(kind='bar')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.