基于日期时间列名的 dataframe 的条件平均值

Question

我是 python 的新手。 我正在寻找一种方法来根据列名生成行值的平均值（列名是从 1 月到 12 月的日期系列格式）。 我想在一年的时间里每 10 天生成一次平均值。 我的 dataframe 格式如下（2000 行）

import pandas as pd
df= pd.DataFrame({'A':[81,80.09,83,85,88],
                  'B':[21.8,22.04,21.8,21.7,22.06],
                  '20210113':[0,0.05,0,0,0.433],
                  '20210122':[0,0.13,0,0,0.128],
                  '20210125':[0.056,0,0.043,0.062,0.16],
                  '20210213':[0.9,0.56,0.32,0.8,0],
                  '20210217':[0.7,0.99,0.008,0.23,0.56],
                  '20210219':[0.9,0.43,0.76,0.98,0.5]})

预期 Output：

In [2]: df
Out[2]: 
   A        B     c(Mean 20210111,..20210119 ) D(Mean of 20210120..20210129)..
0  81       21.8
1  80.09    22.04
2  83       21.8
3  85       21.7           
4  88       22.06

Answer 1

一种方法是将日期列与 DF 的 rest 隔离开来。 转置它以能够使用正常的分组操作。 然后转回并合并到 DataFrame 的未受影响部分。

import pandas as pd

df = pd.DataFrame({'A': [81, 80.09, 83, 85, 88],
                   'B': [21.8, 22.04, 21.8, 21.7, 22.06],
                   '20210113A.2': [0, 0.05, 0, 0, 0.433],
                   '20210122B.1': [0, 0.13, 0, 0, 0.128],
                   '20210125C.3': [0.056, 0, 0.043, 0.062, 0.16],
                   '20210213': [0.9, 0.56, 0.32, 0.8, 0],
                   '20210217': [0.7, 0.99, 0.008, 0.23, 0.56],
                   '20210219': [0.9, 0.43, 0.76, 0.98, 0.5]})

# Unaffected Columns Go Here
keep_columns = ['A', 'B']

# Get All Affected Columns
new_df = df.loc[:, ~df.columns.isin(keep_columns)]

# Strip Extra Information From Column Names
new_df.columns = new_df.columns.map(lambda c: c[0:8])

# Transpose
new_df = new_df.T

# Convert index to DateTime for easy use
new_df.index = pd.to_datetime(new_df.index, format='%Y%m%d')

# Resample every 10 Days on new DT index (Drop any rows with no values)
new_df = new_df.resample("10D").mean().dropna(how='all')

# Transpose and Merge Back on DF
df = df[keep_columns].merge(new_df.T, left_index=True, right_index=True)

# For Display
print(df.to_string())

Output：

       A      B  2021-01-13 00:00:00  2021-01-23 00:00:00  2021-02-12 00:00:00
0  81.00  21.80               0.0000                0.056             0.833333
1  80.09  22.04               0.0900                0.000             0.660000
2  83.00  21.80               0.0000                0.043             0.362667
3  85.00  21.70               0.0000                0.062             0.670000
4  88.00  22.06               0.2805                0.160             0.353333

new_df = df.loc[:, ~df.columns.isin(keep_columns)]

new_df

              0     1      2      3      4
20210113  0.000  0.05  0.000  0.000  0.433
20210122  0.000  0.13  0.000  0.000  0.128
20210125  0.056  0.00  0.043  0.062  0.160
20210213  0.900  0.56  0.320  0.800  0.000
20210217  0.700  0.99  0.008  0.230  0.560
20210219  0.900  0.43  0.760  0.980  0.500

new_df.index = pd.to_datetime(new_df.index, format='%Y%m%d')

new_df

                0     1      2      3      4
2021-01-13  0.000  0.05  0.000  0.000  0.433
2021-01-22  0.000  0.13  0.000  0.000  0.128
2021-01-25  0.056  0.00  0.043  0.062  0.160
2021-02-13  0.900  0.56  0.320  0.800  0.000
2021-02-17  0.700  0.99  0.008  0.230  0.560
2021-02-19  0.900  0.43  0.760  0.980  0.500

new_df = new_df.resample("10D").mean().dropna(how='all')

new_df

                   0     1         2      3         4
2021-01-13  0.000000  0.09  0.000000  0.000  0.280500
2021-01-23  0.056000  0.00  0.043000  0.062  0.160000
2021-02-12  0.833333  0.66  0.362667  0.670  0.353333

new_df.T

   2021-01-13  2021-01-23  2021-02-12
0      0.0000       0.056    0.833333
1      0.0900       0.000    0.660000
2      0.0000       0.043    0.362667
3      0.0000       0.062    0.670000
4      0.2805       0.160    0.353333

基于日期时间列名的 dataframe 的条件平均值

问题描述

1 个解决方案

解决方案1
0 2021-04-28 19:44:15

基于日期时间列名的 dataframe 的条件平均值

问题描述

1 个解决方案

解决方案1 0 2021-04-28 19:44:15

解决方案1
0 2021-04-28 19:44:15