[英]Conditional mean of a dataframe based on datetime column names
我是 python 的新手。 我正在尋找一種方法來根據列名生成行值的平均值(列名是從 1 月到 12 月的日期系列格式)。 我想在一年的時間里每 10 天生成一次平均值。 我的 dataframe 格式如下(2000 行)
import pandas as pd
df= pd.DataFrame({'A':[81,80.09,83,85,88],
'B':[21.8,22.04,21.8,21.7,22.06],
'20210113':[0,0.05,0,0,0.433],
'20210122':[0,0.13,0,0,0.128],
'20210125':[0.056,0,0.043,0.062,0.16],
'20210213':[0.9,0.56,0.32,0.8,0],
'20210217':[0.7,0.99,0.008,0.23,0.56],
'20210219':[0.9,0.43,0.76,0.98,0.5]})
預期 Output:
In [2]: df
Out[2]:
A B c(Mean 20210111,..20210119 ) D(Mean of 20210120..20210129)..
0 81 21.8
1 80.09 22.04
2 83 21.8
3 85 21.7
4 88 22.06
一種方法是將日期列與 DF 的 rest 隔離開來。 轉置它以能夠使用正常的分組操作。 然后轉回並合並到 DataFrame 的未受影響部分。
import pandas as pd
df = pd.DataFrame({'A': [81, 80.09, 83, 85, 88],
'B': [21.8, 22.04, 21.8, 21.7, 22.06],
'20210113A.2': [0, 0.05, 0, 0, 0.433],
'20210122B.1': [0, 0.13, 0, 0, 0.128],
'20210125C.3': [0.056, 0, 0.043, 0.062, 0.16],
'20210213': [0.9, 0.56, 0.32, 0.8, 0],
'20210217': [0.7, 0.99, 0.008, 0.23, 0.56],
'20210219': [0.9, 0.43, 0.76, 0.98, 0.5]})
# Unaffected Columns Go Here
keep_columns = ['A', 'B']
# Get All Affected Columns
new_df = df.loc[:, ~df.columns.isin(keep_columns)]
# Strip Extra Information From Column Names
new_df.columns = new_df.columns.map(lambda c: c[0:8])
# Transpose
new_df = new_df.T
# Convert index to DateTime for easy use
new_df.index = pd.to_datetime(new_df.index, format='%Y%m%d')
# Resample every 10 Days on new DT index (Drop any rows with no values)
new_df = new_df.resample("10D").mean().dropna(how='all')
# Transpose and Merge Back on DF
df = df[keep_columns].merge(new_df.T, left_index=True, right_index=True)
# For Display
print(df.to_string())
Output:
A B 2021-01-13 00:00:00 2021-01-23 00:00:00 2021-02-12 00:00:00 0 81.00 21.80 0.0000 0.056 0.833333 1 80.09 22.04 0.0900 0.000 0.660000 2 83.00 21.80 0.0000 0.043 0.362667 3 85.00 21.70 0.0000 0.062 0.670000 4 88.00 22.06 0.2805 0.160 0.353333
new_df = df.loc[:, ~df.columns.isin(keep_columns)]
new_df
0 1 2 3 4 20210113 0.000 0.05 0.000 0.000 0.433 20210122 0.000 0.13 0.000 0.000 0.128 20210125 0.056 0.00 0.043 0.062 0.160 20210213 0.900 0.56 0.320 0.800 0.000 20210217 0.700 0.99 0.008 0.230 0.560 20210219 0.900 0.43 0.760 0.980 0.500
new_df.index = pd.to_datetime(new_df.index, format='%Y%m%d')
new_df
0 1 2 3 4 2021-01-13 0.000 0.05 0.000 0.000 0.433 2021-01-22 0.000 0.13 0.000 0.000 0.128 2021-01-25 0.056 0.00 0.043 0.062 0.160 2021-02-13 0.900 0.56 0.320 0.800 0.000 2021-02-17 0.700 0.99 0.008 0.230 0.560 2021-02-19 0.900 0.43 0.760 0.980 0.500
new_df = new_df.resample("10D").mean().dropna(how='all')
new_df
0 1 2 3 4 2021-01-13 0.000000 0.09 0.000000 0.000 0.280500 2021-01-23 0.056000 0.00 0.043000 0.062 0.160000 2021-02-12 0.833333 0.66 0.362667 0.670 0.353333
new_df.T
2021-01-13 2021-01-23 2021-02-12 0 0.0000 0.056 0.833333 1 0.0900 0.000 0.660000 2 0.0000 0.043 0.362667 3 0.0000 0.062 0.670000 4 0.2805 0.160 0.353333
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.