简体   繁体   中英

Daily data to monthly average on Python

I am really (really) new to Python language and I am having some problems to calculate the monthly average of a river flow rate using daily data. Data example:

01/01/1981/;42.989
02/01/1981/;32.093
03/01/1981/;25.334
04/01/1981/;25.334
05/01/1981/;25.767
06/01/1981/;28.868
07/01/1981/;40.925
08/01/1981/;29.777
...
04/02/1981/;27.969
05/02/1981/;27.969
06/02/1981/;29.777
07/02/1981/;30.696
...
29/12/2014/;91.843
30/12/2014/;83.645
31/12/2014/;77.336

I would to calculate monthly river flow rate using the daily data. I also know that packages like numpy or panda do that to, but i need to make this without using them.

for row in arq:
    a = row.split(';')
    x = a[0]
    y = float(a[1])
    x = row.split("/" or "/;  ")
    day = int(x[0])
    month = int(x[1])
    year = int(x[2])
    nl.append(y)
    average = sum(nl)/ len(nl)
print(average)

So, if you can help I would be really thankful.

This solution uses a dictionary monthly_averages to keep track of the running average for each year/month combination. It stores the data in a tuple pair, the first value of which is the current average and the second value is the number of observations (needed to update the average for any subsequent data).

Note that avg observations = sum(observations) / n where n is the number of observations. Given a new data point, the new average would be (sum(observations) + new data point) / (n + 1) . This is then expressed as (sum(observations) / n * n / (n + 1) + new data point) / (n + 1) . Note that sum(observations) / n was our prior average, so the new average can thus be expressed as follows: new_avg = prior_avg * n / (n + 1) + new data point / (n + 1) . Or, more simply: new_avg = (prior_avg * n + new data point) / (n + 1) .

I believe the rest of the code is self explanatory, but please let me know if you don't understand any of it and then I'll do my best to clarify.

monthly_averages = {}
for row in arq:
    date, daily_rainfall = row.split(';')  # Tuple unpacking.
    day, month, year = date[:-1].split('/')  # Still string format, but that is ok.
    prior_data = monthly_averages.get((year, month))
    if prior_data:
        prior_avg, count = prior_data   # Tuple unpacking.
        new_avg = (prior_avg * count + float(daily_rainfall)) / (count + 1) 
        monthly_averages[(year, month)] = (new_avg, count + 1)
    else:
        monthly_averages[(year, month)] = (float(daily_rainfall), 1)

Print the results in a sorted order:

for year_month in sorted(monthly_averages):
    print('{}-{}: {:.2f}'.format(*year_month, monthly_averages[year_month][0]))

I am assuming the variable arq has the required data as rows. Please check the following code:

arq_dict={}
for row in arq:
    rlst=row.split(';')
    date=rlst[0]
    val=float(rlst[1])
    month=date[3:]
    if month in arq_dict :
        arq_dict[month].append(val)
    else:
        arq_dict[month]=[val]

for k in arq_dict:
    print("%s;%.3f" %(k,sum(arq_dict[k])/len(arq_dict[k])))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM