简体   繁体   English

为特定的行范围以及遍历列创建python熊猫直方图?

[英]Create python pandas histograms for specific row range as well as iterating through columns?

I have a set of data that I transpose to get the data groups into rows and have place holder values (for my case, series of mass values) in columns. 我有一组要转置的数据,以将数据组划分为行,并在列中具有占位符值(就我而言,是一系列质量值)。 My next goal is to draw histograms for each row that contains the same character as shown below: 我的下一个目标是为包含相同字符的每一行绘制直方图,如下所示:

mz     902.4  909.4   915.3
n       0.6    0.3     1.4
n.1     0.4    0.3     1.3
n.2     0.3    0.2     1.1
n.3     0.2    0.2     1.3
n.4     0.4    0.3     1.4
DCIS           0.3     1.6
DCIS.1  0.3    1.2
DCIS.2         1.1
DCIS.3  0.2    1.2
DCIS.4  0.2    1.3
DCIS.5  0.2    0.1     1.5
br_1    0.5    0.4     1.4
br_1.1         0.2     1.3
br_1.2  0.5    0.2     1.4
br_1.3  0.5    0.2     1.6
br_1.4         1.4

My goal is to draw histograms starting from 902.4 for those with letter n as group 1, DCIS as group 2, and so forth, and these groups are to be in the same histogram plot. 我的目标是为字母n为第1组,DCIS为第2组等的人绘制从902.4开始的直方图,这些组应在同一直方图中。 Then I plan to iterate the same process through columns, so the code should produce the same number of columns of histograms. 然后,我计划遍历各列重复相同的过程,因此代码应产生相同数量的直方图列。

Below is my code so far (input file is an excel xlsx file before transposing): 下面是到目前为止的代码(输入文件是转置前的excel xlsx文件):

nh = pd.ExcelFile(nheight)
df = pd.read_excel(nh, index=False)

dfn = df.filter(like='n', axis=0)
dfbr1234 = df.filter(like='br', axis=0)

plt.figure()
plt.hist([dfn, dfbr1234], bins=50)
plt.show()

I've tried to just group together the rows with letter 'br' into a group just for testing, but it is producing zero-size array to reduction operation minimum which has no identity error. 我试图将带有字母“ br”的行分组在一起以进行测试,但是它会产生零大小的数组,以减少最小化操作,并且没有身份错误。

Edit: So the dataframe is the table above. 编辑:所以数据框是上面的表。

直方图顺序

What I want to do is to draw a single plot of histogram that contains 3 separate histograms designated by the black, red, and orange boxes in the above screenshot. 我想做的是绘制一个直方图,其中包含3个单独的直方图,在上面的屏幕截图中分别用黑色,红色和橙色框指定。 The goal is to compare the different boxes within a single plot, and I want to iterate so that I can do the same for the other two columns (column 2, and 3 in the picture). 目的是比较单个图中的不同方框,我想进行迭代,以便对其他两列(图片中的第2列和第3列)执行相同的操作。 I tried using df.filter function to filter 'like='n'' and so forth, but I am not sure on how to combine the different filtered data along with iterating through the columns. 我尝试使用df.filter函数过滤'like ='n''等等,但是我不确定如何结合使用不同的过滤数据以及遍历各列。 The code above doesn't have the iteration yet, but I was thinking about utilizing iloc[:,variable] for the iteration. 上面的代码还没有迭代,但是我正在考虑利用iloc [:,variable]进行迭代。

Here's a one basic approach, 这是一种基本方法

df = pd.read_clipboard()
df = df.fillna(0)
print(df)

        mz  902.4  909.4  915.3
0        n    0.6    0.3    1.4
1      n.1    0.4    0.3    1.3
2      n.2    0.3    0.2    1.1
3      n.3    0.2    0.2    1.3
4      n.4    0.4    0.3    1.4
5     DCIS    0.3    1.6    0.0
6   DCIS.1    0.3    1.2    0.0
7   DCIS.2    1.1    0.0    0.0
8   DCIS.3    0.2    1.2    0.0
9   DCIS.4    0.2    1.3    0.0
10  DCIS.5    0.2    0.1    1.5
11    br_1    0.5    0.4    1.4
12  br_1.1    0.2    1.3    0.0
13  br_1.2    0.5    0.2    1.4
14  br_1.3    0.5    0.2    1.6
15  br_1.4    1.4    0.0    0.0

Making the subsets (this step can be taken in to the iteration below if the logic can be well defined), 制作子集(如果可以很好地定义逻辑,则可以进入下面的迭代),

df_n = df.loc[df['mz'].str.startswith('n')]
df_D = df.loc[df['mz'].str.startswith('D')]
df_b = df.loc[df['mz'].str.startswith('b')]

Using matplotlib 's subplots() 使用matplotlibsubplots()

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=df.shape[1]-1,ncols=1)
plt.tight_layout()

for i in range(1,df.shape[1]):
    df_n.iloc[:,i].hist(ax=ax[i-1],color = 'k', alpha=0.4) # reduced alpha because you're plotting many histograms on top of each other
    df_D.iloc[:,i].hist(ax=ax[i-1],color = 'r', alpha=0.4)
    df_b.iloc[:,i].hist(ax=ax[i-1],color = 'orange', alpha=0.4)
    ax[i-1].set_title("Histograms for " + df.columns[i])

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM