R 在組 ID 內重新編碼

Question

我想 (1) 創建一個唯一的組 ID，以及 (2) 如果一個變量滿足組內的條件，則重新編碼它。 我有以下 ATM 位置數據：

data <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, 
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

2018 年之后， location_type_description變量被錯誤地編碼為“金融機構”。

條件：如果address和terminal_id ID 中的location_type_description在 2019 年之前不是“金融機構”，那么我們將location_type_description重新編碼為 2019 年之前的任何內容。但如果location_type_description在所有年份（2017 年）都是“金融機構”開始）然后我們知道是否編碼正確。 在我們的例子中，由於它是 2017 年和 2018 年的“加油站”，我們知道 2018 年之后的任何東西實際上都是加油站。 這是玩具數據中的輸出

data_clean <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

Answer 1

這個怎么樣：

  library(dplyr)
  data <- tibble::tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, 
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

data %>% 
  group_by(address) %>% 
  mutate(id = cur_group_id(), 
         location_type_description = location_type_description[1])
#> # A tibble: 5 × 5
#> # Groups:   address [1]
#>   address               date       terminal_id location_type_description    id
#>   <chr>                 <date>     <chr>       <chr>                     <int>
#> 1 1 GATEWAY DR OROMOCTO 2017-01-01 NC79        Gas Station                   1
#> 2 1 GATEWAY DR OROMOCTO 2018-01-01 NC79        Gas Station                   1
#> 3 1 GATEWAY DR OROMOCTO 2019-11-01 NC79        Gas Station                   1
#> 4 1 GATEWAY DR OROMOCTO 2020-01-01 NC79        Gas Station                   1
#> 5 1 GATEWAY DR OROMOCTO 2020-12-01 NC79        Gas Station                   1

^{由reprex 包於 2022-06-29 創建 (v2.0.1)}

Answer 2

我添加了一些額外的 ATM 位置，以確保它適用於各種條件。

library(magrittr)
library(dplyr)

data <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, 
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
  "4 PRIVET DR LITTLE WHINGING", "2017-01-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2018-01-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2019-11-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2020-01-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2020-12-01", "AB123", "Gas Station",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2017-01-01", "XY10", "Other",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2018-01-01", "XY10", "Other",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2019-11-01", "XY10", "Financial Institution",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2020-01-01", "XY10", "Financial Institution",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2020-12-01", "XY10", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2017-01-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2018-01-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2019-11-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2020-01-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2020-12-01", "4227", "Financial Institution",
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

data_clean <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

# dataframe of address and group identifiers
groupID <- data.frame(terminal_id = unique(data$terminal_id), group_identifier = 1:length(unique(data$terminal_id)))
# dataframe of original location_types
OGloctype <- data %>%
  filter(date < as.Date('2019-01-01')) %>%
  rename(location_corrected = location_type_description) %>%
  select(c(terminal_id, location_corrected)) %>%
  distinct()

data %>%
  full_join(groupID, by = 'terminal_id') %>%
  full_join(OGloctype, by = 'terminal_id') %>%
  group_by(terminal_id) %>%
  # any() looks for any matches within the group
  mutate(location_changed = ifelse(any(location_corrected != location_type_description),
                                   'yes', 'no')) %>%
  ungroup()

R 在組 ID 內重新編碼

問題描述

2 個解決方案

解決方案1
1 2022-06-29 17:06:18

解決方案2
0 已采納 2022-06-29 17:19:49

R 在組 ID 內重新編碼

問題描述

2 個解決方案

解決方案1 1 2022-06-29 17:06:18

解決方案2 0 已采納 2022-06-29 17:19:49

解決方案1
1 2022-06-29 17:06:18

解決方案2
0 已采納 2022-06-29 17:19:49