簡體   English   中英

如何使用 dplyr 在 r 中以不同條件在不同組中創建新變量

[英]How to create a new variable in different group with different condition in r with dplyr

我想在具有不同條件的不同組中的數據框中添加一個新變量。 我的數據是這樣的:

test <- data.frame(country =rep( letters[1:5], each = 10),
                   time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day')) %>% mutate(time = as.Date(time))

lockdown_time <- data.frame(country = letters[1:4],
                            start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
                            end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07')) 

我會以country == 'a'為例:

# use country a as an example 

test_a <- test  %>%  filter(country == 'a')

start_time_a <- lockdown_time[1,2] %>% as.Date()

end_time_a <- lockdown_time[1,3] %>% as.Date()


test_a %>% mutate(lockdown = case_when(between(time, start_time_a, end_time_a) ~ 1, T ~ 0))

我知道如何在每個國家一個一個地添加新的可變lockdown ,但我想知道是否有一種有效的方法來做到這一點。 請注意, lockdown_time dataframe 中沒有country == 'e' ,因此在country == 'e'中創建的lockdown變量應該都是NA

你需要一個left_join ,我也在使用lubridate package 在日期之間輕松測試。

library(tidyverse)
library(lubridate)

test <- data.frame(
  country =rep( letters[1:5], each = 10),
  time = seq(from = as.Date('2020-01-01'), to = as.Date('2020-02-19'), by = 'day'),
  stringsAsFactors = F
  ) %>%
  mutate(time = lubridate::as_date(time))

lockdown_time <- data.frame(
  country = letters[1:4],
  start_time = c('2020-01-06', '2020-01-16', '2020-01-26', '2020-02-05'),
  end_time = c('2020-01-08','2020-01-18','2020-01-28','2020-02-07'),
  stringsAsFactors = F
  ) %>% 
  mutate(
    start_time = as_date(start_time),
    end_time = as_date(end_time))

test %>% 
  left_join(lockdown_time) %>% 
  mutate(lockdown = as.integer(time %within% interval(start_time, end_time)))

您可以使用>=<=來確定日期是否在指定范圍內。

library(dplyr)

test %>% 
  left_join(lockdown_time, by = "country") %>% 
  mutate(start_time = as.Date(start_time), end_time = as.Date(end_time),
         lockdown = + (time >= start_time & time <= end_time)) %>%
  select(-ends_with("_time"))

或將between()rowwise()一起使用

test %>% 
  left_join(lockdown_time, by = "country") %>% 
  mutate(start_time = as.Date(start_time), end_time = as.Date(end_time)) %>%
  rowwise() %>% 
  mutate(lockdown = + between(time, start_time, end_time)) %>%
  select(-ends_with("_time")) %>%
  ungroup()

Output

# A tibble: 50 x 3
   country time       lockdown
   <chr>   <date>        <int>
 1 a       2020-01-01        0
 2 a       2020-01-02        0
 3 a       2020-01-03        0
 4 a       2020-01-04        0
 5 a       2020-01-05        0
 6 a       2020-01-06        1
 7 a       2020-01-07        1
 8 a       2020-01-08        1
 9 a       2020-01-09        0
10 a       2020-01-10        0
11 b       2020-01-11        0
12 b       2020-01-12        0
13 b       2020-01-13        0
14 b       2020-01-14        0
15 b       2020-01-15        0
16 b       2020-01-16        1
17 b       2020-01-17        1
18 b       2020-01-18        1
19 b       2020-01-19        0
20 b       2020-01-20        0
⠇
46 e       2020-02-15       NA
47 e       2020-02-16       NA
48 e       2020-02-17       NA
49 e       2020-02-18       NA
50 e       2020-02-19       NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM