简体   繁体   English

如何获得python的每日平均值?

[英]How could I get a daily average in python?

I have a file that is formatted like this: 我有一个格式如下的文件:

(Year - Month - Day - Data) (年-月-日-数据)

1980 - 1 - 1 - 1.2 1980-1-1-1.2
1980 - 1 - 2 - 1.3 1980-1-2-1.3
1980 - 1 - 3 - 1.4 1980-1-3-1.4
1980 - 1 - 4 - 1.5 1980-1-4-1.5
1980 - 1 - 5 - 1.6 1980-1-5-1.6
1980 - 1 - 6 - 1.7 1980-1-6-1.7
1980 - 1 - 7 - 1.8 1980-1-7-1.8

It is in a numpy array. 它在一个numpy数组中。 It is data over the course of about 24 years, so what I want to be able to do is take the average per day and put it into a seperate 1D-array that would just be 366 (for leap year) averages, which I could then plot using matplotlib and be able to see the trend over the course of the years. 它是大约24年间的数据,所以我想做的是每天取平均值,并将其放入一个单独的1D数组中,该数组只是366(leap年)平均值,我可以然后使用matplotlib进行绘图,并可以查看多年来的趋势。 If there anyway to use subsetting in a loop so I could accomplish this? 如果仍然在循环中使用子集,那么我可以做到这一点吗?

Using pandas is definitely the way to go. 使用熊猫绝对是必经之路。 There are at least two ways to group by 'day of the year', you could do either the numeric day of the year as a string or the string monthday combination like so: 至少有两种方式可以对“一年中的某天”进行分组,您可以将年份中的数字天作为字符串或字符串monthday组合来进行,如下所示:

import pandas as pd
import numpy as np

df = pd.DataFrame(index=pd.date_range('2000-01-01', '2010-12-31'))

df['vals'] = np.random.randint(1, 6, df.shape[0])

print(df.groupby(df.index.strftime("%j")).mean())
print(df.groupby(df.index.strftime("%m%d")).mean())

For anyone coming to this question hoping to find an alternative way of processing unusual input here is some code. 对于任何想解决此问题的人,这里都有一些代码。

In its essentials, the code reads the input file a line at a time, picks out the elements of dates and values, reassembles these into lines that pandas can readily parse and puts them into a StringIO object. 从本质上讲,代码一次读取输入文件一行,挑选日期和值的元素,将它们重新组合成大熊猫可以轻松解析的行,并将其放入StringIO对象。

Pandas reads them from there, as if from a csv file. 熊猫从那里读取它们,就像从csv文件读取一样。 I have cribbed the grouping code from PiRSquared. 我已经从PiRSquared编写了分组代码

import pandas as pd
import re
from io import StringIO

file_name = 'temp.txt'

for_pd = StringIO()
with open(file_name) as f:
    for line in f:
        pieces = re.search(r'([0-9]{4}) - ([0-9]{,2}) - ([0-9]{,2}) - ([0-9.]+)', line).groups()
        pieces = [int(_) for _ in pieces[:3]] + [pieces[3]]
        print ('%.4i-%.2i-%.2i,%s' % tuple(pieces), file=for_pd)
for_pd.seek(0)

df = pd.read_csv(for_pd, header=None, names=['datetimes', 'values'], parse_dates=['datetimes'])

print (df.set_index('datetimes').groupby(pd.TimeGrouper('D')).mean().dropna())
print (df.set_index('datetimes').groupby(pd.TimeGrouper('W')).mean().dropna())

This is the output. 这是输出。

            values
datetimes         
1980-01-01     1.2
1980-01-02     1.3
1980-01-03     1.4
1980-01-04     1.5
1980-01-05     1.6
1980-01-06     1.7
1980-01-07     1.8
            values
datetimes         
1980-01-06    1.45
1980-01-13    1.80

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM