简体   繁体   中英

R: Weekly average value from daily observations

I am struggling to find a code that works for my purpose of calculating the average temperature each week for three cities X, Y and Z combined. I have data from 01.01.2017 to 31.12.2020 on a daily basis, and there are two values for each day, for each city. My data looks like this:

ID City Date Value Week
1 X 01-01-2017 1.7 2016-52
1 X 01-01-2017 2.3 2016-52
2 Y 01-01-2017 3.9 2016-52
2 Y 01-01-2017 2.6 2016-52
3 Z 01-01-2017 0.9 2016-52
3 Z 01-01-2017 1.6. 2016-52
1 X 02-01-2017 1.9 2017-01
1 X 02-01-2017 2.0 2017-01
2 Y 02-01-2017 4.9 2017-01
2 Y 02-01-2017 3.6 2017-01
3 Z 02-01-2017 1.9 2017-01
3 Z 02-01-2017 1.8. 2017-01
.. .. .......... ..... .......
1 X 31-12-2020 0.7 2020-53
1 X 31-12-2020 0.3 2020-53
2 Y 31-12-2020 0.2 2020-53
2 Y 31-12-2020 1.1 2020-53
3 Z 31-12-2020 0.9 2020-53
3 Z 31-12-2020 0.4 2020-53

I therefore need to make a code that gives the daily average of both values for each city X, Y and Z. In example, take the average of the daily City X values 1.7 and 2.3 and combining them in a table, for all cities X, Y and Z for each day in the period.

Further, I need to summarize the new daily values (that now are only one observation per city per day), and now calculate the average of all three cities combined on a weekly basis. The reason I need them to be weekly is because they are going to be merged with another dataset that consists of weekly observations later on.

I was thinking about calculating the weekly average temperature for all cities X, Y and Z by making a code that summarize it in groups by the week-variable for the entire dataset, but I am open for other suggestions. The most important thing with the task is to get the weekly average temperature for the cities combined for the entire period.

Would be very helpful if someone could share their thoughts and/or suggestions of code to use for this purpose!

if it isn't what you desire to see, let me know and you better put an output data frame example in your question

library(dplyr)

yourdata %>%
group_by(Week,City) %>%
summarise(weekly_avg=mean(Value))

Issue with average is what is the base of that everage. With your example data as the number of records per X, Y, Z are equal so if you calculate the mean of all city in all data you can just simply calculate mean of the Value which is fast and simple. However if the records for city is not equal then the cities with more records will get more weight in this general mean.

The one below is step by step calculation that ensure that at the final calculation step the each city contribute equally here with three city the weight of each city is 1/3 of the total mean value.

library(dplyr)

data <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 
    3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), City = c("X", "X", "Y", "Y", 
      "Z", "Z", "X", "X", "Y", "Y", "Z", "Z", "X", "X", "Y", "Y", "Z", 
      "Z"), Date = c("01-01-2017", "01-01-2017", "01-01-2017", "01-01-2017", 
        "01-01-2017", "01-01-2017", "02-01-2017", "02-01-2017", "02-01-2017", 
        "02-01-2017", "02-01-2017", "02-01-2017", "31-12-2020", "31-12-2020", 
        "31-12-2020", "31-12-2020", "31-12-2020", "31-12-2020"), Value = c(1.7, 
          2.3, 3.9, 2.6, 0.9, 1.6, 1.9, 2, 4.9, 3.6, 1.9, 1.8, 0.7, 0.3, 
          0.2, 1.1, 0.9, 0.4), Week = c("2016-52", "2016-52", "2016-52", 
            "2016-52", "2016-52", "2016-52", "2017-01", "2017-01", "2017-01", 
            "2017-01", "2017-01", "2017-01", "2020-53", "2020-53", "2020-53", 
            "2020-53", "2020-53", "2020-53")), row.names = c(NA, -18L),
    class = "data.frame")
daily_by_city <- data %>%
  group_by(Week, Date, City) %>%
  summarize(Value = mean(Value), .groups = "drop")
daily_by_city
#> # A tibble: 9 x 4
#>   Week    Date       City  Value
#>   <chr>   <chr>      <chr> <dbl>
#> 1 2016-52 01-01-2017 X      2   
#> 2 2016-52 01-01-2017 Y      3.25
#> 3 2016-52 01-01-2017 Z      1.25
#> 4 2017-01 02-01-2017 X      1.95
#> 5 2017-01 02-01-2017 Y      4.25
#> 6 2017-01 02-01-2017 Z      1.85
#> 7 2020-53 31-12-2020 X      0.5 
#> 8 2020-53 31-12-2020 Y      0.65
#> 9 2020-53 31-12-2020 Z      0.65
# 2nd - weekly average for each city using daily_summary
weekly_average_by_city <- daily_by_city %>%
  group_by(Week, City) %>%
  summarize(Value = mean(Value), .groups = "drop")
weekly_average_by_city
#> # A tibble: 9 x 3
#>   Week    City  Value
#>   <chr>   <chr> <dbl>
#> 1 2016-52 X      2   
#> 2 2016-52 Y      3.25
#> 3 2016-52 Z      1.25
#> 4 2017-01 X      1.95
#> 5 2017-01 Y      4.25
#> 6 2017-01 Z      1.85
#> 7 2020-53 X      0.5 
#> 8 2020-53 Y      0.65
#> 9 2020-53 Z      0.65
# 3rd - whole data average for each city using weekly_average
overall_average_by_city <- weekly_average_by_city %>%
  group_by(City) %>%
  summarize(Value = mean(Value), .groups = "drop")
overall_average_by_city
#> # A tibble: 3 x 2
#>   City  Value
#>   <chr> <dbl>
#> 1 X      1.48
#> 2 Y      2.72
#> 3 Z      1.25

# all city average which is equal to `mean(data$Value)` for this kind of data
mean(overall_average_by_city$Value)
#> [1] 1.816667
mean(data$Value)
#> [1] 1.816667

Created on 2021-04-21 by the reprex package (v2.0.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM