I have 4 data frames, each data frame corresponding to a single year. Each data frame contains daily rainfall for five locations.
Generate sample data
location <- c("A","B","C","D","E")
mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits=2)
dat.1981 <-as.data.frame(cbind(location,mat)) # rainfall for 1981
dat.1981$year <- 1981
mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits = 2)
dat.1982 <- as.data.frame(cbind(location,mat)) # rainfall for 1982
dat.1982$year <- 1982
mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits = 2)
dat.1983 <-as.data.frame(cbind(location,mat)) # rainfall for 1983
dat.1983$year <- 1983
mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits = 2)
dat.1984 <-as.data.frame(cbind(location,mat)) # rainfall for 1984
dat.1984$year <- 1984
dat <- as.data.frame(rbind(dat.1981,dat.1982,dat.1983,dat.1984))
For each year, I want to classify whether a day was an extreme wet day or not
Here's how I do my calculation:
1) For each location, generate the mean and sd of rainfall for every week for the period 1981 to 1984. For example, in location A the mean rainfall for the first week will be:
(First week rain 1981 in A + First week rain 1982 in A + First week rain 1983 in A + First week rain 1984 in A)/4
which can be written in R as
mean.week1.loc1 <- mean(rowSums(dat[dat$location=="A",2:8])) # 2:8 selects the first 7 days in each year
sd.week1.loc1 <- sd(rowSums(dat[dat$location=="A",2:8]))
wet.cr <- mean.week1 + sd.week1 # this is my threshold for defining a wet day
If daily rainfall in week 1 for the years 1981 to 1984 in location A is greater than wet.cr
, that day is a wet day and hence gets a value of 1
As an example, to examine whether rainfall of week 1 for location A for 1981 to 1984 is a wet day or not I can do the following:
lapply(dat[, 2:8], function(x) ifelse(x > wet.cr, 1, 0))
I want to repeat this for each week and each location.
However, I am unable to stitch these individual functions together and also my final results should be a dataframe same as dat
but instead of rainfall values, I will have 1 or 0 defining whether it is a wet day or not.
EDIT
The solutions below gives me the following:
mean(c(rainfall 1981 day 1 week 1, ...., rainfall 1981 day 7 week 1, rainfall 1982 day 1 week 1,....,rainfall 1982 day 7 week 1,....,rainfall 1984 day 1 week 1,....,rainfall 1984 day 7 week 1))
WHAT I WANT IS:
mean(c(mean(total rainfall week 1 1981), mean(total rainfall week 1 1982), mean(total rainfall week 1 1983), mean(total rainfall week 1 1984)))
I hope this is clear now.
A tidyverse
solution
library(magrittr)
library(tidyverse)
dat_m <- gather(dat, day, rainfall, -location, -year)
str(dat_m)
dat_m %<>%
mutate(day = gsub("V", "", day)) %>%
mutate(day = as.numeric(day)) %>%
mutate(week = as.integer(ceiling(day/7))) %>%
group_by(location, week) %>%
mutate(wet.cr = mean(rainfall, na.rm = TRUE) + sd(rainfall, na.rm = TRUE) ) %>%
mutate(indication = ifelse(rainfall > wet.cr, 1, 0)) %>%
ungroup()
dat_m
# A tibble: 7,300 x 7
location year day rainfall week wet.cr indication
<fctr> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 A 1981 1.00 0.880 1 0.845 1.00
2 B 1981 1.00 0.850 1 0.829 1.00
3 C 1981 1.00 1.00 1 0.877 1.00
4 D 1981 1.00 0.100 1 0.755 0
5 E 1981 1.00 0.190 1 0.750 0
6 A 1982 1.00 0.380 1 0.845 0
7 B 1982 1.00 0.760 1 0.829 0
8 C 1982 1.00 0.940 1 0.877 1.00
9 D 1982 1.00 0.900 1 0.755 1.00
10 E 1982 1.00 0.600 1 0.750 0
# ... with 7,290 more rows
Edit : For rainfall, I think it's better to use sum
(total) than mean
So we first calculate the total weekly rainfall for every year. Then we calculate the long-term average & stdev of the total weekly rainfall.
dat_m %<>%
mutate(day = as.numeric(gsub("V", "", day)),
week = as.integer(ceiling(day/7))) %>%
group_by(location, week, year) %>%
mutate(total_weekly_rainfall = sum(rainfall, na.rm = TRUE)) %>%
ungroup() %>%
group_by(location, week) %>%
mutate(mean_weekly_rainfall = sum(rainfall, na.rm = TRUE)/length(unique(year)),
stddev_weekly_rainfall = sd(rainfall, na.rm = TRUE),
wet.cr = mean_weekly_rainfall + stddev_weekly_rainfall,
indication = ifelse(total_weekly_rainfall > wet.cr, 1, 0)) %>%
arrange(location, year, day) %>%
ungroup() %>%
distinct(location, year, week, .keep_all = TRUE)
dat_m
# A tibble: 1,060 x 10
location year day rainfall week total_wee~ mean_wee~ stddev_w~ wet.~ indic~
<fctr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1981 1.00 0.880 1 0.880 0.630 0.277 0.907 0
2 A 1981 8.00 0.190 2 0.190 0.330 0.431 0.761 0
3 A 1981 15.0 0.630 3 0.630 0.548 0.331 0.878 0
4 A 1981 22.0 0.0300 4 0.0300 0.290 0.259 0.549 0
5 A 1981 29.0 0.360 5 0.360 0.308 0.196 0.504 0
6 A 1981 36.0 0.540 6 0.540 0.500 0.225 0.725 0
7 A 1981 43.0 0.0300 7 0.0300 0.375 0.289 0.664 0
8 A 1981 50.0 0.170 8 0.170 0.332 0.375 0.708 0
9 A 1981 57.0 0.260 9 0.260 0.652 0.272 0.924 0
10 A 1981 64.0 0.590 10 0.590 0.512 0.202 0.715 0
# ... with 1,050 more rows
using data.table :
library(data.table)
dat <- setDT(dat)
newdat <- melt(dat, measure.vars = patterns("^V"),variable.name = "day",value.name = "rain")
newdat[,day := as.character(day)]
newdat[,day := as.numeric(unlist(lapply(newdat$day,function(x){strsplit(x,"V")[[1]][2]})))]
newdat[,Week := day %/% 7]
newdat[,threshold := mean(rain) + sd(rain), by = .(location,Week)]
newdat[,wet := ifelse(rain > threshold,1,0)]
print(newdat,topn = 100)
location year day rain Week threshold wet
1: A 1981 1 0.73 0 0.7630065 0
2: B 1981 1 0.69 0 0.8599243 0
3: C 1981 1 0.45 0 0.8145956 0
4: D 1981 1 0.51 0 0.8935058 0
5: E 1981 1 0.77 0 0.6992752 1
6: A 1982 1 0.47 0 0.7630065 0
7: B 1982 1 0.70 0 0.8599243 0
8: C 1982 1 0.48 0 0.8145956 0
9: D 1982 1 0.92 0 0.8935058 1
a step by step explanation: first you need to change format of your data to ease the calculus. The long format is more appropriate, as each column V## is actually a variable that is the number day. This is done using melt
melt(dat, measure.vars = patterns("^V"),variable.name = "day",value.name = "rain")
location year day rain
1: A 1981 V1 0.73
2: B 1981 V1 0.69
3: C 1981 V1 0.45
4: D 1981 V1 0.51
5: E 1981 V1 0.77
---
7296: A 1984 V365 0.31
7297: B 1984 V365 0.99
7298: C 1984 V365 0.25
7299: D 1984 V365 0.24
7300: E 1984 V365 0.87
Then you tranform your day to a real number, in order to be able to calculate the week
newdat[,day := as.character(day)]
newdat[,day := as.numeric(unlist(lapply(newdat$day,function(x){strsplit(x,"V")[[1]][2]})))]
> newdat[,.(day,year)]
day year
1: 1 1981
2: 1 1981
3: 1 1981
4: 1 1981
5: 1 1981
Then calculate the week number the same as you do
newdat[,Week := day %/% 7]
The statistic for the calculus of the sthreshold is done by grouping on the weeks and places (so statistic over the year for each place)
newdat[,threshold := mean(rain) + sd(rain), by = .(location,Week)]
and define your wet day as the day when rain is higher than the threshold
newdat[,wet := ifelse(rain > threshold,1,0)]
but I agree on the comment that the initial data was surely in a better format than what you are presenting.
For both the data.table and the tidyverse solutions you may be well served to treat this as a scaling exercise (az score in many disciplines), since the mean + n standard deviation is a well known benchmark.
For the data.table solution you would:
newdat[,zrain := scale(rain), by = .(location,Week)]
newdat[,zwet := ifelse(zrain > 1.0,1,0)]
where you rely on scale
from base and compare to 1.0
For the tidyverse that becomes:
mutate(zrain = scale(rainfall)) %>%
mutate(indication = ifelse(zrain > 1.0, 1, 0)) %>%
That way in the future if your standard for what "wet" is changes you only have to change one number in your code
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.