[英]R recode within group ID
I want to (1) create a unique group ID, and (2) recode one variable if it meets a condition within the group.我想 (1) 创建一个唯一的组 ID,以及 (2) 如果一个变量满足组内的条件,则重新编码它。 I have the following data of ATM locations:
我有以下 ATM 位置数据:
data <- tribble(
~address, ~date, ~terminal_id, ~location_type_description,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
After 2018, the location_type_description
variable was incorrectly coded as "Financial Institution". 2018 年之后,
location_type_description
变量被错误地编码为“金融机构”。
Condition : if the location_type_description
within an address
and terminal_id
is anything other than "Financial Institution" before the year 2019, then we recode the location_type_description
to be whatever is was before 2019. But if the location_type_description
is "Financial Institution" for all years (2017 onwards) then we know if was coded correctly.条件:如果
address
和terminal_id
ID 中的location_type_description
在 2019 年之前不是“金融机构”,那么我们将location_type_description
重新编码为 2019 年之前的任何内容。但如果location_type_description
在所有年份(2017 年)都是“金融机构”开始)然后我们知道是否编码正确。 In our example, since it was "Gas Station" in 2017 and 2018, we know that anything after 2018 is actually a gas station.在我们的例子中,由于它是 2017 年和 2018 年的“加油站”,我们知道 2018 年之后的任何东西实际上都是加油站。 Here is what the output would look like in the toy data
这是玩具数据中的输出
data_clean <- tribble(
~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
How about this:这个怎么样:
library(dplyr)
data <- tibble::tribble(
~address, ~date, ~terminal_id, ~location_type_description,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
data %>%
group_by(address) %>%
mutate(id = cur_group_id(),
location_type_description = location_type_description[1])
#> # A tibble: 5 × 5
#> # Groups: address [1]
#> address date terminal_id location_type_description id
#> <chr> <date> <chr> <chr> <int>
#> 1 1 GATEWAY DR OROMOCTO 2017-01-01 NC79 Gas Station 1
#> 2 1 GATEWAY DR OROMOCTO 2018-01-01 NC79 Gas Station 1
#> 3 1 GATEWAY DR OROMOCTO 2019-11-01 NC79 Gas Station 1
#> 4 1 GATEWAY DR OROMOCTO 2020-01-01 NC79 Gas Station 1
#> 5 1 GATEWAY DR OROMOCTO 2020-12-01 NC79 Gas Station 1
Created on 2022-06-29 by the reprex package (v2.0.1)由reprex 包于 2022-06-29 创建 (v2.0.1)
I added a few extra ATM locations to make sure it would work for various conditions.我添加了一些额外的 ATM 位置,以确保它适用于各种条件。
library(magrittr)
library(dplyr)
data <- tribble(
~address, ~date, ~terminal_id, ~location_type_description,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution",
"1 GATEWAY DR OROMOCTO", "2020-12-01", "NC79", "Financial Institution",
"4 PRIVET DR LITTLE WHINGING", "2017-01-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2018-01-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2019-11-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2020-01-01", "AB123", "Gas Station",
"4 PRIVET DR LITTLE WHINGING", "2020-12-01", "AB123", "Gas Station",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2017-01-01", "XY10", "Other",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2018-01-01", "XY10", "Other",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2019-11-01", "XY10", "Financial Institution",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2020-01-01", "XY10", "Financial Institution",
"42 WALLABY WAY SYDNEY AUSTRALIA", "2020-12-01", "XY10", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2017-01-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2018-01-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2019-11-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2020-01-01", "4227", "Financial Institution",
"742 EVERGREEN TERRACE SPRINGFIELD", "2020-12-01", "4227", "Financial Institution",
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
data_clean <- tribble(
~address, ~date, ~terminal_id, ~location_type_description, ~group_identifier, ~location_corrected, ~location_changed,
"1 GATEWAY DR OROMOCTO", "2017-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2018-01-01", "NC79", "Gas Station", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2019-11-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-01-01", "NC79", "Financial Institution", 1, "Gas Station", "yes",
"1 GATEWAY DR OROMOCTO", "2020-02-01", "NC79", "Financial Institution", 1, "Gas Station", "yes"
) %>%
dplyr::mutate(
dplyr::across(date, as.Date)
)
# dataframe of address and group identifiers
groupID <- data.frame(terminal_id = unique(data$terminal_id), group_identifier = 1:length(unique(data$terminal_id)))
# dataframe of original location_types
OGloctype <- data %>%
filter(date < as.Date('2019-01-01')) %>%
rename(location_corrected = location_type_description) %>%
select(c(terminal_id, location_corrected)) %>%
distinct()
data %>%
full_join(groupID, by = 'terminal_id') %>%
full_join(OGloctype, by = 'terminal_id') %>%
group_by(terminal_id) %>%
# any() looks for any matches within the group
mutate(location_changed = ifelse(any(location_corrected != location_type_description),
'yes', 'no')) %>%
ungroup()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.