简体   繁体   English

减少熊猫DataFrame的列中的行以进行绘图

[英]Reducing rows in a column for a panda DataFrame for plotting

So I have a csv table of data which I have read into a panda DataFrame, however one of the columns has the same string in multiple rows, which is correct as its a classification data, but when I plot this column against another of values, it treats each cell in this column as separate rather than combining them. 因此,我有一个csv数据表,已将其读入panda DataFrame中,但是其中一列在多行中具有相同的字符串,这与它的分类数据是正确的,但是当我将该列与另一个值相对应时,它将此列中的每个单元格视为单独的而不是将它们组合在一起。

Classification        Value
MIR-weak:             0.0896571179
MIR-weak:             0.1990277968
MIR-bright:           0.2850534357
MIR-bright:           0.0807078051
FIR-dark/MIR-bright:  1.7610864745
MIR-weak:             0.0826692503
MIR-weak:             0.349403222
MIR-weak:             0.7326764485
MIR-weak:             0.0179843643
MIR-weak:             0.0761941975
MIR-bright:           0.4298597194
MIR-weak:             0.4143098599
MIR-weak:             0.1439220025
MIR-weak:             0.0810787048
MIR-bright:           0.6369812293
MIR-weak:             0.0973845298
MIR-weak:             0.1871236732
MIR-weak:             1.5795256821
MIR-weak:             0.9072559132
MIR-weak:             0.6218977498
FIR-dark/MIR-bright:  0.6920326523
MIR-weak:             0.2580561867
MIR-bright:           0.055071288
MIR-weak:             1.0512992066

So when I plot these columns against each other using DataFrame.plot(), the x-axis has every cell in the first column as an x value rather than just four x values, one for each classification 因此,当我使用DataFrame.plot()绘制这些列彼此相对时,x轴将第一列中的每个单元格都作为x值,而不是四个x值,每个分类都一个

Any way to sort this, either with .plot() or doing something with the data? 用.plot()或对数据进行某种排序的方式吗?

I presume you want a stacked bar plot, so starting with your dataframe looking like this 我假设您想要一个堆积的条形图,所以从您的数据框开始像这样

Classification     Value
0              MIR-weak  0.089657
1              MIR-weak  0.199028
2            MIR-bright  0.285053
3            MIR-bright  0.080708
4   FIR-dark/MIR-bright  1.761086
5              MIR-weak  0.082669
6              MIR-weak  0.349403
7              MIR-weak  0.732676
8              MIR-weak  0.017984
9              MIR-weak  0.076194
10           MIR-bright  0.429860
11             MIR-weak  0.414310
12             MIR-weak  0.143922
13             MIR-weak  0.081079
14           MIR-bright  0.636981
15             MIR-weak  0.097385
16             MIR-weak  0.187124
17             MIR-weak  1.579526
18             MIR-weak  0.907256
19             MIR-weak  0.621898
20  FIR-dark/MIR-bright  0.692033
21             MIR-weak  0.258056
22           MIR-bright  0.055071
23             MIR-weak  1.051299

you can do these steps: 您可以执行以下步骤:

  • Sort by Classification. 按分类排序。

  • Pivot around Classification. 围绕分类旋转。

  • Change columns to get rid of the multi-index. 更改列以摆脱多索引。

  • Do a stacked bar plot of the transposed dataframe. 对转置后的数据框进行堆叠的条形图。

.

D = D.sort_values("Classification").reset_index(drop=True)
D = D.pivot(columns='Classification')
D.columns = ["FIR-dark/MIR-bright", "MIR-bright", "MIR-weak"]
D.T.plot.bar(stacked=True,legend=False)

The result looks pretty ugly though, so you need to tweak the appearance. 结果看起来很丑陋,因此您需要调整外观。

Not sure if that's the correct thing since it only has three categories, but your original also has only three. 不确定这是否正确,因为它只有三个类别,但是您的原件也只有三个类别。

You need to tell pandas that the 'Classification' column contains categorical data, to do so, use astype 您需要告诉熊猫“分类”列包含分类数据,为此,请使用astype

I use read_clipboard to read the data in the OP 我使用read_clipboard读取OP中的数据

import pandas as pd

df = pd.read_clipboard()

df['Classification']=df['Classification'].str.strip(':').astype(
'category',categories=['MIR-weak', 
                       'MIR-bright',
                       'FIR-dark/MIR-bright'], ordered=True)

df.plot(x='Classification',y='Value')

The graph will look like 该图看起来像 在此处输入图片说明

you can also use groupby with mean/sum/size or any other measure to view the data, here is an example grouping data by the classification column and calculating the mean for each group then plotting the result 您还可以将groupby与均值/总和/大小或任何其他度量一起使用,以查看数据,这是按分类列对数据进行分组并计算每个组的均值然后绘制结果的示例

df.groupby('Classification').mean().plot(kind='bar')

The result will look like 结果看起来像 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM