Pandas dataframe conditional mean based on column names

Question

It will be the easiest to explain starting with a sample of the dataframe:

    TimeStamp   382.098     382.461     383.185     383.548
    10:28:00    0.012448    0.012362    0.0124485   0.012362
    10:30:00    0.0124135   0.0123965   0.0124135   0.012431
    10:32:00    0.0551035   0.0551725   0.055931    0.0563105
    10:34:00    0.055586    0.0557245   0.056655    0.0569485
    10:36:00    0.055586    0.055776    0.0568105   0.057362

I want my output to be:

    TimeStamp   382         383
    10:28:00    0.012405    0.01240525
    10:30:00    0.012405    0.01242225
    10:32:00    0.05513     0.05612075
    10:34:00    0.05565525  0.05680175
    10:36:00    0.055681    0.05708625

So, I want to look at the column name values and if they are the same up to whole numbers, I want the output col to have the mean of the values for each time index value.

My idea was to use df.round to round the column headers to the nearest whole number and then to use .mean() to somehow apply the mean on axis = 0 for same col headers. But, I get an error using the round function on dataframe index type.

EDIT: based on the answers, I used

df.rename(columns=dict(zip(df.columns[0:], df.columns[0:]\
          .values.astype(float).round().astype(str))),inplace=True)
df = df.groupby(df.columns[0:], axis=1).mean()

And it messes up the column names as well as the values instead of giving me the mean based on col names...no idea why!

Answer 1

Use groupby along the 1st axis with a lambda .

df.set_index('TimeStamp', inplace=True)
df.groupby(by=lambda x: int(x.split('.')[0]), axis=1).mean()

                382       383
TimeStamp
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086

Answer 2

Rename columns with type conversion, move TimeStamp to index, and then use groupby to get column means:

df.rename(columns=lambda x: int(float(x)) if x!="TimeStamp" else x, inplace=True)
df.set_index("TimeStamp", inplace=True)

df
                382       382       383       383
TimeStamp                                        
10:28:00   0.012448  0.012362  0.012448  0.012362
10:30:00   0.012414  0.012396  0.012414  0.012431
10:32:00   0.055103  0.055172  0.055931  0.056310
10:34:00   0.055586  0.055725  0.056655  0.056948
10:36:00   0.055586  0.055776  0.056810  0.057362


df.groupby(df.columns, axis=1).mean()

                382       383
TimeStamp                    
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086

Answer 3

with np.floor rename and groupby

df.rename(columns=dict(zip(df.columns[1:], np.floor(df.columns[1:].values.astype(float)).astype(str))),inplace=True)
df.set_index('TimeStamp').groupby(level=0,axis=1).mean().reset_index()
Out[171]: 
  TimeStamp     382.0     383.0
0  10:28:00  0.012405  0.012405
1  10:30:00  0.012405  0.012422
2  10:32:00  0.055138  0.056121
3  10:34:00  0.055655  0.056802
4  10:36:00  0.055681  0.057086

Answer 4

Another method is via pd.to_numeric , just a slight variant of @coldspeed's answer ie

df = df.set_index('TimeStamp')

df.groupby(pd.to_numeric(df.columns).astype(int),1).mean()

            382       383
TimeStamp                    
10:28:00   0.012405  0.012405
10:30:00   0.012405  0.012422
10:32:00   0.055138  0.056121
10:34:00   0.055655  0.056802
10:36:00   0.055681  0.057086

Answer 5

Generalised solution

df = pd.DataFrame({383.045:[1,2], 383.96:[3,4], 383.78:[5,5], 343:[9,11]})
df.columns = [int(i) for i in df.columns]
for i in set(df.columns):
    if len(df[i].shape) == 2:
        mean = df[i].T.sum()/float(df[i].shape[1])
        df = df.drop([i],1)
        df[i] = mean

Answer 6

To round the column values to the nearest integer, you can group on a list comprehension that rounds each column (barring the first which is TimeStamp ) to the nearest whole number and then takes the integer:

>>> (df
     .set_index('TimeStamp')
     .groupby([int(round(col, 0)) for col in df.columns[1:].astype(float)], axis=1)
     .mean())
                382       383       384
TimeStamp                              
10:28:00   0.012405  0.012448  0.012362
10:30:00   0.012405  0.012414  0.012431
10:32:00   0.055138  0.055931  0.056310
10:34:00   0.055655  0.056655  0.056948
10:36:00   0.055681  0.056810  0.057362

Pandas dataframe conditional mean based on column names

Question

6 answers

solution1
11 2017-10-15 21:52:52

solution2
6 2017-10-15 21:42:51

solution3
5 2017-10-15 21:44:40

solution4
3 2018-09-10 06:02:55

solution5
3 2018-09-11 14:27:35

solution6
2 2018-09-12 06:56:40

Pandas dataframe conditional mean based on column names

Question

6 answers

solution1 11 2017-10-15 21:52:52

solution2 6 2017-10-15 21:42:51

solution3 5 2017-10-15 21:44:40

solution4 3 2018-09-10 06:02:55

solution5 3 2018-09-11 14:27:35

solution6 2 2018-09-12 06:56:40

solution1
11 2017-10-15 21:52:52

solution2
6 2017-10-15 21:42:51

solution3
5 2017-10-15 21:44:40

solution4
3 2018-09-10 06:02:55

solution5
3 2018-09-11 14:27:35

solution6
2 2018-09-12 06:56:40