[英]How do I average monthly data to get yearly values in Python?
I have a dataset that looks like this:我有一个如下所示的数据集:
Date![]() |
Value![]() |
---|---|
1871-01 ![]() |
4.5 ![]() |
1871-02 ![]() |
10.7 ![]() |
1871-03 ![]() |
8.9 ![]() |
1871-04 ![]() |
1.3 ![]() |
all the way to 2021-12.一直到 2021-12 学年。
how do I get the average value for each year in Python?如何在 Python 中获得每年的平均值? For example, the 1871 average would be the average of all of the values from 1871-01 to 1871-1 and I would like it for all years from 1871-2021.
例如,1871 年的平均值将是从 1871-01 到 1871-1 的所有值的平均值,我希望它适用于从 1871 年到 2021 年的所有年份。 ... ...
……
given your data is in a pandas dataframe called df:鉴于您的数据位于名为 df 的 pandas 数据框中:
>>> df
Date Value
0 1871-01 4.5
1 1871-02 10.7
2 1871-03 8.9
3 1871-04 1.3
4 1872-02 1.5
5 1872-03 15.9
6 1872-04 7.3
>>> year_df = df.set_index(pd.to_datetime(df['Date'])).groupby(pd.Grouper(freq='Y')).mean()
>>> year_df.index = year_df.index.year
>>> year_df
Date Value
1871 6.35
1872 8.233333333333333
Depends on the what format the data is being given to you.取决于向您提供数据的格式。 Is it json?
是json吗? csv?
.csv? If you already know how to import and read the data with python.. you just need to assign the years to variables and average them.
如果您已经知道如何使用 python 导入和读取数据。您只需将年份分配给变量并对其进行平均。 (x1 + x2 + x3) / (number of averaged variables)
(x1 + x2 + x3) / (平均变量数)
Make a numpy array with the values, reshape and use np.mean.使用值创建一个 numpy 数组,重塑并使用 np.mean。
Example with only 3 years worth of "data"仅具有 3 年“数据”价值的示例
import numpy as np
values=np.random.normal(0,1,36)
yearly_avgs=np.mean(values.reshape((len(values)//12,12)),axis=1)
This will help you to get average of all the data according to monthly average for years.这将帮助您根据多年的月平均值获得所有数据的平均值。 In this method there's no need to set
date
as index and will return single level dataframe as shown in output.在此方法中,无需将
date
设置为索引,并将返回单级数据帧,如输出所示。
import pandas as pd
import numpy as np
df=pd.DataFrame({"date":pd.date_range("1871-01","2021-12",freq="M"),"val":np.random.randint(10,100,[1811])}) # 1811 months
df[df["date"].dt.year==1871].mean() # 57.666667
df.groupby(pd.PeriodIndex(df["date"],freq="y"))["val"].mean().reset_index()
Above method will return same output even if date
feature is of str
data type.即使
date
特征是str
数据类型,上述方法也将返回相同的输出。
Following below will return the same output given the column/feature is date
type.鉴于列/功能是
date
类型,以下将返回相同的输出。
df.groupby(df["date"].dt.year)["val"].mean().reset_index()
Output .head()
:输出
.head()
:
date![]() |
val![]() |
|
---|---|---|
0 ![]() |
1871 ![]() |
57.666667 ![]() |
1 ![]() |
1872 ![]() |
58.916667 ![]() |
2 ![]() |
1873 ![]() |
52.416667 ![]() |
3 ![]() |
1874 ![]() |
41.666667 ![]() |
4 ![]() |
1875 ![]() |
57.583333 ![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.