简体   繁体   中英

Create python pandas histograms for specific row range as well as iterating through columns?

I have a set of data that I transpose to get the data groups into rows and have place holder values (for my case, series of mass values) in columns. My next goal is to draw histograms for each row that contains the same character as shown below:

mz     902.4  909.4   915.3
n       0.6    0.3     1.4
n.1     0.4    0.3     1.3
n.2     0.3    0.2     1.1
n.3     0.2    0.2     1.3
n.4     0.4    0.3     1.4
DCIS           0.3     1.6
DCIS.1  0.3    1.2
DCIS.2         1.1
DCIS.3  0.2    1.2
DCIS.4  0.2    1.3
DCIS.5  0.2    0.1     1.5
br_1    0.5    0.4     1.4
br_1.1         0.2     1.3
br_1.2  0.5    0.2     1.4
br_1.3  0.5    0.2     1.6
br_1.4         1.4

My goal is to draw histograms starting from 902.4 for those with letter n as group 1, DCIS as group 2, and so forth, and these groups are to be in the same histogram plot. Then I plan to iterate the same process through columns, so the code should produce the same number of columns of histograms.

Below is my code so far (input file is an excel xlsx file before transposing):

nh = pd.ExcelFile(nheight)
df = pd.read_excel(nh, index=False)

dfn = df.filter(like='n', axis=0)
dfbr1234 = df.filter(like='br', axis=0)

plt.figure()
plt.hist([dfn, dfbr1234], bins=50)
plt.show()

I've tried to just group together the rows with letter 'br' into a group just for testing, but it is producing zero-size array to reduction operation minimum which has no identity error.

Edit: So the dataframe is the table above.

直方图顺序

What I want to do is to draw a single plot of histogram that contains 3 separate histograms designated by the black, red, and orange boxes in the above screenshot. The goal is to compare the different boxes within a single plot, and I want to iterate so that I can do the same for the other two columns (column 2, and 3 in the picture). I tried using df.filter function to filter 'like='n'' and so forth, but I am not sure on how to combine the different filtered data along with iterating through the columns. The code above doesn't have the iteration yet, but I was thinking about utilizing iloc[:,variable] for the iteration.

Here's a one basic approach,

df = pd.read_clipboard()
df = df.fillna(0)
print(df)

        mz  902.4  909.4  915.3
0        n    0.6    0.3    1.4
1      n.1    0.4    0.3    1.3
2      n.2    0.3    0.2    1.1
3      n.3    0.2    0.2    1.3
4      n.4    0.4    0.3    1.4
5     DCIS    0.3    1.6    0.0
6   DCIS.1    0.3    1.2    0.0
7   DCIS.2    1.1    0.0    0.0
8   DCIS.3    0.2    1.2    0.0
9   DCIS.4    0.2    1.3    0.0
10  DCIS.5    0.2    0.1    1.5
11    br_1    0.5    0.4    1.4
12  br_1.1    0.2    1.3    0.0
13  br_1.2    0.5    0.2    1.4
14  br_1.3    0.5    0.2    1.6
15  br_1.4    1.4    0.0    0.0

Making the subsets (this step can be taken in to the iteration below if the logic can be well defined),

df_n = df.loc[df['mz'].str.startswith('n')]
df_D = df.loc[df['mz'].str.startswith('D')]
df_b = df.loc[df['mz'].str.startswith('b')]

Using matplotlib 's subplots()

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=df.shape[1]-1,ncols=1)
plt.tight_layout()

for i in range(1,df.shape[1]):
    df_n.iloc[:,i].hist(ax=ax[i-1],color = 'k', alpha=0.4) # reduced alpha because you're plotting many histograms on top of each other
    df_D.iloc[:,i].hist(ax=ax[i-1],color = 'r', alpha=0.4)
    df_b.iloc[:,i].hist(ax=ax[i-1],color = 'orange', alpha=0.4)
    ax[i-1].set_title("Histograms for " + df.columns[i])

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM