简体   繁体   English

R 在组 ID 内重新编码

[英]R recode within group ID

I want to (1) create a unique group ID, and (2) recode one variable if it meets a condition within the group.我想 (1) 创建一个唯一的组 ID,以及 (2) 如果一个变量满足组内的条件,则重新编码它。 I have the following data of ATM locations:我有以下 ATM 位置数据:

data <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, 
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

After 2018, the location_type_description variable was incorrectly coded as "Financial Institution". 2018 年之后, location_type_description变量被错误地编码为“金融机构”。

Condition : if the location_type_description within an address and terminal_id is anything other than "Financial Institution" before the year 2019, then we recode the location_type_description to be whatever is was before 2019. But if the location_type_description is "Financial Institution" for all years (2017 onwards) then we know if was coded correctly.条件:如果addressterminal_id ID 中的location_type_description在 2019 年之前不是“金融机构”,那么我们将location_type_description重新编码为 2019 年之前的任何内容。但如果location_type_description在所有年份(2017 年)都是“金融机构”开始)然后我们知道是否编码正确。 In our example, since it was "Gas Station" in 2017 and 2018, we know that anything after 2018 is actually a gas station.在我们的例子中,由于它是 2017 年和 2018 年的“加油站”,我们知道 2018 年之后的任何东西实际上都是加油站。 Here is what the output would look like in the toy data这是玩具数据中的输出

data_clean <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

How about this:这个怎么样:

  library(dplyr)
  data <- tibble::tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, 
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

data %>% 
  group_by(address) %>% 
  mutate(id = cur_group_id(), 
         location_type_description = location_type_description[1])
#> # A tibble: 5 × 5
#> # Groups:   address [1]
#>   address               date       terminal_id location_type_description    id
#>   <chr>                 <date>     <chr>       <chr>                     <int>
#> 1 1 GATEWAY DR OROMOCTO 2017-01-01 NC79        Gas Station                   1
#> 2 1 GATEWAY DR OROMOCTO 2018-01-01 NC79        Gas Station                   1
#> 3 1 GATEWAY DR OROMOCTO 2019-11-01 NC79        Gas Station                   1
#> 4 1 GATEWAY DR OROMOCTO 2020-01-01 NC79        Gas Station                   1
#> 5 1 GATEWAY DR OROMOCTO 2020-12-01 NC79        Gas Station                   1

Created on 2022-06-29 by the reprex package (v2.0.1)reprex 包于 2022-06-29 创建 (v2.0.1)

I added a few extra ATM locations to make sure it would work for various conditions.我添加了一些额外的 ATM 位置,以确保它适用于各种条件。

library(magrittr)
library(dplyr)

data <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, 
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
  "1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
  "4 PRIVET DR LITTLE WHINGING", "2017-01-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2018-01-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2019-11-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2020-01-01", "AB123", "Gas Station",
  "4 PRIVET DR LITTLE WHINGING", "2020-12-01", "AB123", "Gas Station",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2017-01-01", "XY10", "Other",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2018-01-01", "XY10", "Other",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2019-11-01", "XY10", "Financial Institution",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2020-01-01", "XY10", "Financial Institution",
  "42 WALLABY WAY SYDNEY AUSTRALIA", "2020-12-01", "XY10", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2017-01-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2018-01-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2019-11-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2020-01-01", "4227", "Financial Institution",
  "742 EVERGREEN TERRACE SPRINGFIELD", "2020-12-01", "4227", "Financial Institution",
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

data_clean <- tribble(
  ~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
  "1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
  "1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
  
) %>%
  dplyr::mutate(
    dplyr::across(date, as.Date)
  )

# dataframe of address and group identifiers
groupID <- data.frame(terminal_id = unique(data$terminal_id), group_identifier = 1:length(unique(data$terminal_id)))
# dataframe of original location_types
OGloctype <- data %>%
  filter(date < as.Date('2019-01-01')) %>%
  rename(location_corrected = location_type_description) %>%
  select(c(terminal_id, location_corrected)) %>%
  distinct()

data %>%
  full_join(groupID, by = 'terminal_id') %>%
  full_join(OGloctype, by = 'terminal_id') %>%
  group_by(terminal_id) %>%
  # any() looks for any matches within the group
  mutate(location_changed = ifelse(any(location_corrected != location_type_description),
                                   'yes', 'no')) %>%
  ungroup()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM