Create python pandas histograms for specific row range as well as iterating through columns?

Question

I have a set of data that I transpose to get the data groups into rows and have place holder values (for my case, series of mass values) in columns. My next goal is to draw histograms for each row that contains the same character as shown below:

mz     902.4  909.4   915.3
n       0.6    0.3     1.4
n.1     0.4    0.3     1.3
n.2     0.3    0.2     1.1
n.3     0.2    0.2     1.3
n.4     0.4    0.3     1.4
DCIS           0.3     1.6
DCIS.1  0.3    1.2
DCIS.2         1.1
DCIS.3  0.2    1.2
DCIS.4  0.2    1.3
DCIS.5  0.2    0.1     1.5
br_1    0.5    0.4     1.4
br_1.1         0.2     1.3
br_1.2  0.5    0.2     1.4
br_1.3  0.5    0.2     1.6
br_1.4         1.4

My goal is to draw histograms starting from 902.4 for those with letter n as group 1, DCIS as group 2, and so forth, and these groups are to be in the same histogram plot. Then I plan to iterate the same process through columns, so the code should produce the same number of columns of histograms.

Below is my code so far (input file is an excel xlsx file before transposing):

nh = pd.ExcelFile(nheight)
df = pd.read_excel(nh, index=False)

dfn = df.filter(like='n', axis=0)
dfbr1234 = df.filter(like='br', axis=0)

plt.figure()
plt.hist([dfn, dfbr1234], bins=50)
plt.show()

I've tried to just group together the rows with letter 'br' into a group just for testing, but it is producing zero-size array to reduction operation minimum which has no identity error.

Edit: So the dataframe is the table above.

What I want to do is to draw a single plot of histogram that contains 3 separate histograms designated by the black, red, and orange boxes in the above screenshot. The goal is to compare the different boxes within a single plot, and I want to iterate so that I can do the same for the other two columns (column 2, and 3 in the picture). I tried using df.filter function to filter 'like='n'' and so forth, but I am not sure on how to combine the different filtered data along with iterating through the columns. The code above doesn't have the iteration yet, but I was thinking about utilizing iloc[:,variable] for the iteration.

Answer 1

Here's a one basic approach,

df = pd.read_clipboard()
df = df.fillna(0)
print(df)

        mz  902.4  909.4  915.3
0        n    0.6    0.3    1.4
1      n.1    0.4    0.3    1.3
2      n.2    0.3    0.2    1.1
3      n.3    0.2    0.2    1.3
4      n.4    0.4    0.3    1.4
5     DCIS    0.3    1.6    0.0
6   DCIS.1    0.3    1.2    0.0
7   DCIS.2    1.1    0.0    0.0
8   DCIS.3    0.2    1.2    0.0
9   DCIS.4    0.2    1.3    0.0
10  DCIS.5    0.2    0.1    1.5
11    br_1    0.5    0.4    1.4
12  br_1.1    0.2    1.3    0.0
13  br_1.2    0.5    0.2    1.4
14  br_1.3    0.5    0.2    1.6
15  br_1.4    1.4    0.0    0.0

Making the subsets (this step can be taken in to the iteration below if the logic can be well defined),

df_n = df.loc[df['mz'].str.startswith('n')]
df_D = df.loc[df['mz'].str.startswith('D')]
df_b = df.loc[df['mz'].str.startswith('b')]

Using matplotlib 's subplots()

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=df.shape[1]-1,ncols=1)
plt.tight_layout()

for i in range(1,df.shape[1]):
    df_n.iloc[:,i].hist(ax=ax[i-1],color = 'k', alpha=0.4) # reduced alpha because you're plotting many histograms on top of each other
    df_D.iloc[:,i].hist(ax=ax[i-1],color = 'r', alpha=0.4)
    df_b.iloc[:,i].hist(ax=ax[i-1],color = 'orange', alpha=0.4)
    ax[i-1].set_title("Histograms for " + df.columns[i])

Create python pandas histograms for specific row range as well as iterating through columns?

Question

1 answers

solution1
2 ACCPTED 2017-11-03 05:55:13

Create python pandas histograms for specific row range as well as iterating through columns?

Question

1 answers

solution1 2 ACCPTED 2017-11-03 05:55:13

solution1
2 ACCPTED 2017-11-03 05:55:13