[英]How to create statistical summary for the result of clustering for different group of variable in R
[英]How to create a new variable in different group with different condition in r with dplyr
我想在具有不同條件的不同組中的數據框中添加一個新變量。 我的數據是這樣的:
test <- data.frame(country =rep( letters[1:5], each = 10),
time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day')) %>% mutate(time = as.Date(time))
lockdown_time <- data.frame(country = letters[1:4],
start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07'))
我會以country == 'a'
為例:
# use country a as an example
test_a <- test %>% filter(country == 'a')
start_time_a <- lockdown_time[1,2] %>% as.Date()
end_time_a <- lockdown_time[1,3] %>% as.Date()
test_a %>% mutate(lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))
我知道如何在每個國家一個一個地添加新的可變lockdown
,但我想知道是否有一種有效的方法來做到這一點。 請注意, lockdown_time
dataframe 中沒有country == 'e'
,因此在country == 'e'
中創建的lockdown
變量應該都是NA
。
你需要一個left_join
,我也在使用lubridate
package 在日期之間輕松測試。
library(tidyverse)
library(lubridate)
test <- data.frame(
country =rep( letters[1:5], each = 10),
time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day'),
stringsAsFactors = F
) %>%
mutate(time = lubridate::as_date(time))
lockdown_time <- data.frame(
country = letters[1:4],
start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07'),
stringsAsFactors = F
) %>%
mutate(
start_time = as_date(start_time),
end_time = as_date(end_time))
test %>%
left_join(lockdown_time) %>%
mutate(lockdown = as.integer(time %within% interval(start_time, end_time)))
您可以使用>=
和<=
來確定日期是否在指定范圍內。
library(dplyr)
test %>%
left_join(lockdown_time, by = "country") %>%
mutate(start_time = as.Date(start_time), end_time = as.Date(end_time),
lockdown = + (time >= start_time & time <= end_time)) %>%
select(-ends_with("_time"))
或將between()
與rowwise()
一起使用
test %>%
left_join(lockdown_time, by = "country") %>%
mutate(start_time = as.Date(start_time), end_time = as.Date(end_time)) %>%
rowwise() %>%
mutate(lockdown = + between(time, start_time, end_time)) %>%
select(-ends_with("_time")) %>%
ungroup()
Output
# A tibble: 50 x 3
country time lockdown
<chr> <date> <int>
1 a 2020-01-01 0
2 a 2020-01-02 0
3 a 2020-01-03 0
4 a 2020-01-04 0
5 a 2020-01-05 0
6 a 2020-01-06 1
7 a 2020-01-07 1
8 a 2020-01-08 1
9 a 2020-01-09 0
10 a 2020-01-10 0
11 b 2020-01-11 0
12 b 2020-01-12 0
13 b 2020-01-13 0
14 b 2020-01-14 0
15 b 2020-01-15 0
16 b 2020-01-16 1
17 b 2020-01-17 1
18 b 2020-01-18 1
19 b 2020-01-19 0
20 b 2020-01-20 0
⠇
46 e 2020-02-15 NA
47 e 2020-02-16 NA
48 e 2020-02-17 NA
49 e 2020-02-18 NA
50 e 2020-02-19 NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.