Calculating means for multiple columns, in different rows in pandas

Question

I have a csv file like this:

-Species-    -Strain-       -A-       -B-       -C-       -D-
 Species1    Strain1.1         0.2       0.1       0.1       0.4
 Species1    Strain1.1         0.2       0.7       0.2       0.2
 Species1    Strain1.2         0.1       0.6       0.1       0.3
 Species1    Strain1.1         0.2       0.6       0.2       0.6
 Species2    Strain2.1         0.3       0.3       0.3       0.1
 Species2    Strain2.2         0.6       0.2       0.6       0.2
 Species2    Strain2.2         0.2       0.1       0.4       0.2

And I would like to calculate a mean (average) for each unique strain for each of the columns (AD) how would I go about doing it?

I tried df.groupby(['Strain','Species']).mean().mean(1) but that still seems to give me multiple versions of strains in the resulting dataframe, rather than the means for each columns for each unique strain.

Essentially I would like a mean result for A,B,C & D per strain.

Apologies for being unclear, I'm struggling to get my head around this, and I'm very new to programming!

Answer 1

IIUC, you simply need to call

df.groupby(['Species', 'Strain']).mean()

                      A         B         C    D 
Species   Strain                               
Species1  Strain1.1  0.2  0.466667  0.166667  0.4
          Strain1.2  0.1  0.600000  0.100000  0.3
Species2  Strain2.1  0.3  0.300000  0.300000  0.1
          Strain2.2  0.4  0.150000  0.500000  0.2

What you were doing when you called df.groupby(['Strain','Species']).mean().mean(1) was taking the mean of the 4 means in A , B , C , and D . mean(1) means take the mean over the first axis ( ie over the columns).

Calculating means for multiple columns, in different rows in pandas

Question

1 answers

solution1
1 ACCPTED 2018-04-10 15:59:33

Calculating means for multiple columns, in different rows in pandas

Question

1 answers

solution1 1 ACCPTED 2018-04-10 15:59:33

solution1
1 ACCPTED 2018-04-10 15:59:33