[英]R recode within group ID
我想 (1) 創建一個唯一的組 ID,以及 (2) 如果一個變量滿足組內的條件,則重新編碼它。 我有以下 ATM 位置數據:
data <- tribble(
~address, ~date, ~terminal_id, ~location_type_description,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
2018 年之后, location_type_description
變量被錯誤地編碼為“金融機構”。
條件:如果address
和terminal_id
ID 中的location_type_description
在 2019 年之前不是“金融機構”,那么我們將location_type_description
重新編碼為 2019 年之前的任何內容。但如果location_type_description
在所有年份(2017 年)都是“金融機構”開始)然后我們知道是否編碼正確。 在我們的例子中,由於它是 2017 年和 2018 年的“加油站”,我們知道 2018 年之后的任何東西實際上都是加油站。 這是玩具數據中的輸出
data_clean <- tribble(
~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
這個怎么樣:
library(dplyr)
data <- tibble::tribble(
~address, ~date, ~terminal_id, ~location_type_description,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
data %>%
group_by(address) %>%
mutate(id = cur_group_id(),
location_type_description = location_type_description[1])
#> # A tibble: 5 × 5
#> # Groups: address [1]
#> address date terminal_id location_type_description id
#> <chr> <date> <chr> <chr> <int>
#> 1 1 GATEWAY DR OROMOCTO 2017-01-01 NC79 Gas Station 1
#> 2 1 GATEWAY DR OROMOCTO 2018-01-01 NC79 Gas Station 1
#> 3 1 GATEWAY DR OROMOCTO 2019-11-01 NC79 Gas Station 1
#> 4 1 GATEWAY DR OROMOCTO 2020-01-01 NC79 Gas Station 1
#> 5 1 GATEWAY DR OROMOCTO 2020-12-01 NC79 Gas Station 1
由reprex 包於 2022-06-29 創建 (v2.0.1)
我添加了一些額外的 ATM 位置,以確保它適用於各種條件。
library(magrittr)
library(dplyr)
data <- tribble(
~address, ~date, ~terminal_id, ~location_type_description,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
"4 PRIVET DR LITTLE WHINGING", "2017-01-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2018-01-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2019-11-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2020-01-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2020-12-01", "AB123", "Gas Station",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2017-01-01", "XY10", "Other",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2018-01-01", "XY10", "Other",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2019-11-01", "XY10", "Financial Institution",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2020-01-01", "XY10", "Financial Institution",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2020-12-01", "XY10", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2017-01-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2018-01-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2019-11-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2020-01-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2020-12-01", "4227", "Financial Institution",
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
data_clean <- tribble(
~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
# dataframe of address and group identifiers
groupID <- data.frame(terminal_id = unique(data$terminal_id), group_identifier = 1:length(unique(data$terminal_id)))
# dataframe of original location_types
OGloctype <- data %>%
filter(date < as.Date('2019-01-01')) %>%
rename(location_corrected = location_type_description) %>%
select(c(terminal_id, location_corrected)) %>%
distinct()
data %>%
full_join(groupID, by = 'terminal_id') %>%
full_join(OGloctype, by = 'terminal_id') %>%
group_by(terminal_id) %>%
# any() looks for any matches within the group
mutate(location_changed = ifelse(any(location_corrected != location_type_description),
'yes', 'no')) %>%
ungroup()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.