基於匹配不同列中先前值的新列值

Question

我無法在我的數據框中生成一個新列，該列基於不同列中的匹配日期：

df看起來像這樣：

ID        date  booked.date   weather
 1  2016-12-01           NA    clouds
 1  2016-12-02   2014-10-24     sunny           
 1  2016-12-03           NA  overcast         
 2  2016-12-01   2015-12-24    clouds           
 2  2016-12-02   2016-12-01     sunny
 2  2016-12-03   2016-12-01  overcast
 2  2016-12-04   2016-01-13     sunny

date表示入住公寓的日期， booking_date告訴我們公寓的預訂時間。 現在，如果此信息包含在df中，我想添加一個booked_weather列，指示預訂期間的天氣。 輸出將如下所示：

ID        date  booked.date   weather booked_weather
 1  2016-12-01           NA    clouds             NA
 1  2016-12-02   2014-10-24     sunny             NA
 1  2016-12-03           NA  overcast             NA
 2  2016-12-01   2015-12-24    clouds             NA
 2  2016-12-02   2016-12-01     sunny         clouds
 2  2016-12-03   2016-12-01  overcast         clouds
 2  2016-12-04   2016-01-13     sunny             NA

請注意，有多個公寓 ID 的讀數，因此具有相同天氣的重復日期。

這是我嘗試過的，並不能完全滿足我的需要：

df %>%
  mutate(weather_booked = case_when(
    booked.date %in% date ~ weather[booked.date]
  ))

我理解為什么這不會給我正確的結果，但我不知道如何解決它。

Answer 1

library(tidyverse)

df <- read_table("ID  date  booked.date   weather
 1  2016-12-01           NA    clouds
 1  2016-12-02   2014-10-24     sunny           
 1  2016-12-03           NA  overcast         
 2  2016-12-01   2015-12-24    clouds           
 2  2016-12-02   2016-12-01     sunny
 2  2016-12-03   2016-12-01  overcast
 2  2016-12-04   2016-01-13     sunny") 


df %>%  
  mutate(weather_booked = weather[match(booked.date, date)])


#> # A tibble: 7 x 5
#>      ID date       booked.date weather  weather_booked
#>   <dbl> <date>     <date>      <chr>    <chr>         
#> 1     1 2016-12-01 NA          clouds   <NA>          
#> 2     1 2016-12-02 2014-10-24  sunny    <NA>          
#> 3     1 2016-12-03 NA          overcast <NA>          
#> 4     2 2016-12-01 2015-12-24  clouds   <NA>          
#> 5     2 2016-12-02 2016-12-01  sunny    clouds        
#> 6     2 2016-12-03 2016-12-01  overcast clouds        
#> 7     2 2016-12-04 2016-01-13  sunny    <NA>

^{由reprex 包於 2022-06-29 創建 (v2.0.1)}

Answer 2

您可以使用“自連接”來完成此操作，將修改后的數據子集連接回自身。

df %>%
  select(booked.date, weather) %>%
  rename(date = booked.date, booked_weather = weather) %>%
  right_join(df, by = "date")

基於匹配不同列中先前值的新列值

問題描述

2 個解決方案

解決方案1
1 已采納 2022-06-29 16:10:15

解決方案2
0 2022-06-29 16:03:34

基於匹配不同列中先前值的新列值

問題描述

2 個解決方案

解決方案1 1 已采納 2022-06-29 16:10:15

解決方案2 0 2022-06-29 16:03:34

解決方案1
1 已采納 2022-06-29 16:10:15

解決方案2
0 2022-06-29 16:03:34