简体   繁体   中英

How to group replicates as columns in Pandas dataframe

I have a dataset with replicates and I need to go from something like this

        S1   S1   S2   S2
        S1.1 S1.2 S2.1 S2.2
  Ion1  10   8    14   1
  Ion2  0    6    2    3

The first two rows are multilevel headers. s1 and S2 are the samples and s1.1 etc are the file names for a replicate measurements of that sample. They won't be simple names as shown here

I need to go to something like this

        Rep1 Rep2
Ion1 S1 10   8
Ion1 S2 14   1
Ion2 S1 0    6
Ion2 S2 2    3

In which rep1 and rep2 denote the first and second replicates measurements generally and the sample row is stacked.

Eventually I want to calculate the mean of replicates. Right now I am doing this operation in a numpy matrix and inserting a row in the input matrix with the rep numbers and import that into pandas but that is inelegant and I would rather do it in the data frame

EDIT: I think I was a bit confusing. When I say the names won't be simple they won't be S1.1 they might be XF20114 and S1.2 might be XF19372 CF and the S1 might be called 'florida' so the last number of the name can't be relied upon. Right now I just scroll through the sample row in the numpy matrix and put an increasing number in a new row if the sample number is the same as the one before it. If the sample name changes I set the number to 1. This makes table in the example look like:

        S1   S1   S2   S2
        S1.1 S1.2 S2.1 S2.2
        1    2    1    2
  Ion1  10   8    14   1
  Ion2  0    6    2    3

Can I group all values that have the same sample name for an ion regardless of what the replicate name is?

# stack the first level
df_s = df.stack(level=0)
# groupby the last string in the cols and sum
df_s.groupby(df_s.columns.str[3:], axis=1).sum()

            1    2
0                 
Ion1 S1  10.0  8.0
     S2  14.0  1.0
Ion2 S1   0.0  6.0
     S2   2.0  3.0

This is an alternative:

df.columns = pd.MultiIndex.from_tuples((first, last.split('.')[-1]) for first, last in df.columns)

df.stack(0).add_prefix('Rep')

           Rep1 Rep2
Ion1    S1  10  8
        S2  14  1
Ion2    S1  0   6
        S2  2   3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM