简体   繁体   中英

How to calculate average of a variable by hour in R

I'm having trouble when trying to calculate the average temperature by hour.

I have a data frame with date , time (hh:mm:ss pm/am)and temperature . What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.

I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.

I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.

My data looks like this

          date     hour temperature
1   28/12/2013 13:03:01      41.572
2   28/12/2013 13:08:01      46.059
3   28/12/2013 13:13:01       48.55
4   28/12/2013 13:18:01      49.546
5   28/12/2013 13:23:01      49.546
6   28/12/2013 13:28:01      49.546
7   28/12/2013 13:33:01      50.044
8   28/12/2013 13:38:01      50.542
9   28/12/2013 13:43:01      50.542
10  28/12/2013 13:48:01       51.04
11  28/12/2013 13:53:01      51.538
12  28/12/2013 13:58:01      51.538
13  28/12/2013 14:03:01      50.542
14  28/12/2013 14:08:01       51.04
15  28/12/2013 14:13:01       51.04
16  28/12/2013 14:18:01      52.534
17  28/12/2013 14:23:01      53.031
18  28/12/2013 14:28:01      53.031
19  28/12/2013 14:33:01      53.031
20  28/12/2013 14:38:01      51.538
21  28/12/2013 14:43:01      53.031
22  28/12/2013 14:48:01      53.529
etc (24hs data)

And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)

Any suggestion? Thank you very much in advance!

Regards, Maria

Combine the date and hour columns into a POSIXct column and cut() by hourly breaks:

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
date hour temperature
28/12/2013 13:03:01  41.572
28/12/2013 13:08:01  46.059
28/12/2013 13:13:01  48.55
28/12/2013 13:18:01  49.546
28/12/2013 13:23:01  49.546
28/12/2013 13:28:01  49.546
28/12/2013 13:33:01  50.044
28/12/2013 13:38:01  50.542
28/12/2013 13:43:01  50.542
28/12/2013 13:48:01  51.04
28/12/2013 13:53:01  51.538
28/12/2013 13:58:01  51.538
28/12/2013 14:03:01  50.542
28/12/2013 14:08:01  51.04
28/12/2013 14:13:01  51.04
28/12/2013 14:18:01  52.534
28/12/2013 14:23:01  53.031
28/12/2013 14:28:01  53.031
28/12/2013 14:33:01  53.031
28/12/2013 14:38:01  51.538
28/12/2013 14:43:01  53.031
28/12/2013 14:48:01  53.529
28/12/2013 15:01:01  50.77")

df$datehour <- cut(as.POSIXct(paste(df$date, df$hour),
   format="%d/%m/%Y %H:%M:%S"), breaks="hour") 
head(df)
        date     hour temperature            datehour
1 28/12/2013 13:03:01      41.572 2013-12-28 13:00:00
2 28/12/2013 13:08:01      46.059 2013-12-28 13:00:00
3 28/12/2013 13:13:01      48.550 2013-12-28 13:00:00
4 28/12/2013 13:18:01      49.546 2013-12-28 13:00:00
5 28/12/2013 13:23:01      49.546 2013-12-28 13:00:00
6 28/12/2013 13:28:01      49.546 2013-12-28 13:00:00

Now aggregate by that hourly column:

means <- aggregate(temperature ~ datehour, df, mean)
head(means)
             datehour temperature
1 2013-12-28 13:00:00    49.17192
2 2013-12-28 14:00:00    52.23470
3 2013-12-28 15:00:00    50.77000

plot(as.POSIXct(means$datehour), means$temperature, type="l", las=1,
     main="Hourly Avg Temperatures", xlab="Hour", ylab="")

But, for time series data, I like to use package xts:

require(xts)
df.xts <- xts(df$temperature, as.POSIXct(paste(df$date, df$hour),
      format="%d/%m/%Y %H:%M:%S"))
head(df.xts)
                      [,1]
2013-12-28 13:03:01 41.572
2013-12-28 13:08:01 46.059
2013-12-28 13:13:01 48.550
2013-12-28 13:18:01 49.546
2013-12-28 13:23:01 49.546
2013-12-28 13:28:01 49.546

means <- period.apply(df.xts, endpoints(df.xts, "hours"), mean)
head(means)
                        [,1]
2013-12-28 13:58:01 49.17192
2013-12-28 14:48:01 52.23470
2013-12-28 15:01:01 50.77000

Notice how the timestamps are the last entry of each hour. We can align the timestamps (down) to the beginning of the hour with this function:

align.time.down = function(x,n){ index(x) = index(x)-n; align.time(x,n) }
means.rounded <- align.time.down(means, 60*60)  
         # 2nd argument is the number of seconds to adjust/round to,
         # just like function align.time()

head(means.rounded)
                        [,1]
2013-12-28 13:00:00 49.17192
2013-12-28 14:00:00 52.23470
2013-12-28 15:00:00 50.77000

plot(means.rounded, las=1, main="Hourly Avg Temperatures")

在此输入图像描述

It would always easier if sample data and expected output is given in the question.

Solution with Data.table package

require(data.table)
data <- fread('temp.csv',sep=',') #Assuming your data is in temp.csv
#if above step not executed, convert the data frame to data.table 
data <- data.table(data)
> str(data)
Classes ‘data.table’ and 'data.frame':  12 obs. of  3 variables:
$ date       : chr  "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ...
$ hour       : chr  "13:03:01" "13:08:01" "13:13:01" "13:18:01" ...
$ temperature: num  41.6 46.1 48.5 49.5 49.5 ...

> data
      date     hour    temperature      avg
1: 27/12/2013 13:00:00       42.99 35.78455
2: 27/12/2013 14:00:00       65.97 35.78455
3: 27/12/2013 15:00:00       63.57 35.78455 

  data[,list(avg=mean(temperature)),by=hour] #dataset is sorted by hour
    hour   avg
1: 13:00:00 42.99
2: 14:00:00 65.97
3: 15:00:00 63.57
  data[,list(avg=mean(temperature)),by="date,hour"] #data set is grouped by date,then hour
        date     hour   avg
1: 27/12/2013 13:00:00 42.99
2: 27/12/2013 14:00:00 65.97
3: 27/12/2013 15:00:00 63.57

data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] # to group by hour only 
     date     hour    avg
1: 27/12/2013    1 29.530
2: 27/12/2013    4 65.970

Try this example:

library(foqat)
met2=avri(met[,c(1,2)], bkip="1 hour", mode="ncycle", value=24)

#plot it
geom_avri(
    met2,
    cave=2, csd=3,
    alpha=0.5, lcc="#0050b3", rff="#40a9ff", 
    xlab="Hour of day", ylab=bquote(Temp~" "~(degree*C))
)

enter image description here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM