I have a data table with multiple soil measurements per day. Soil moisture ranges from 0-0.8 and there are some NA's as well:
set.seed(24)
df1 <- data.frame(date = sample(seq(as.Date("2015-01-01"),
length.out = 365, by = "1 day"), 5e1, replace = TRUE),
sm = sample(c(NA, runif(10, min=0, max=0.8)), 5e1, replace = TRUE))
I am trying to calculate, by each month, the following statistics:
0 to 0.2
, 0.2 to 0.4
, 0.4 to 0.6
and 0.6 to 0.8
). In the provided example df1
, there are five measurements for january. One out of five is NA, hence NA should total 20%. There is also 0.13
, which would fit the 0-0.2
class. Hence, 20%. There are two 0.23
values, which is in the 0.2-0.4
class, hence 50%. The final 0.68
value goes to the 0.6-0.8
class, which is 20% of the total for january.
This is the expected result:
month NA 0-0.2 0.2-0.4 0.4-0.6 0.6-0.8
1 20% 20% 40% 0% 20%
2 0% 0% 50% 25% 25%
3 0% 0% 16.6% 16.6% 66.8%
...
My unsuccessful attempt to calculate 1.
was the following:
DT[, .(percentage = 100 * sum(is.na(.SD))/length(.SD)), by=month(DT$date)]
but it yields some non-sense percentage values.
Any ideas on how to get there? Thanks!
We can try with tidyverse
. Convert the 'date' to Date
class (if not already), extract the month
from 'date', create a grouping variable with cut
based on the 'sm' column, grouped by 'month' and 'grp' get the number of elements of each group ( n()
) and divide by the total number of rows for each month and spread
it to 'wide' format
library(tidyverse)
df1 %>%
group_by(month = month(date)) %>%
mutate(n = n()) %>%
group_by(grp = cut(sm, breaks = seq(0, 0.8, by = 0.2)), add = TRUE) %>%
summarise(perc = 100 * n()/first(n)) %>%
spread(grp, perc, fill = 0)
# A tibble: 12 x 6
# Groups: month [12]
# month `(0,0.2]` `(0.2,0.4]` `(0.4,0.6]` `(0.6,0.8]` `<NA>`
# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1.00 20.0 40.0 0 20.0 20.0
# 2 2.00 0 50.0 25.0 25.0 0
# 3 3.00 0 16.7 16.7 66.7 0
# 4 4.00 14.3 42.9 42.9 0 0
# 5 5.00 33.3 16.7 0 50.0 0
# 6 6.00 0 100 0 0 0
# 7 7.00 0 66.7 0 0 33.3
# 8 8.00 20.0 60.0 20.0 0 0
# 9 9.00 14.3 28.6 28.6 14.3 14.3
#10 10.0 50.0 50.0 0 0 0
#11 11.0 0 100 0 0 0
#12 12.0 0 33.3 66.7 0 0
Or using data.table
library(data.table)
tmp <- setDT(df1)[, n := .N, month(ymd(date))][, .(perc = 100 * .N/n[1]),
by = .(month = month(ymd(date)),
grp = cut(sm, breaks = seq(0, 0.8, by = 0.2),
labels = c('0 - 0.2', '0.2 - 0.4', '0.4 - 0.6', '0.6 - 0.8')))]
dcast(tmp, month ~ grp, value.var = 'perc')
set.seed(24)
df1 <- data.frame(date = sample(seq(as.Date("2015-01-01"),
length.out = 365, by = "1 day"), 3e4, replace = TRUE),
sm = sample(c(NA, rnorm(10)), 3e4, replace = TRUE))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.