简体   繁体   中英

the quickest way to replace a nested ifelse() statements chain

I have this series of nested statements

data$Country == 1,"Brazil",
  ifelse(data$Country == 2, "Canada",ifelse(
    data$Country == 3, "China",ifelse(
      data$Country == 4, "Ecuador",ifelse(
        data$Country == 5, "France",ifelse(
          data$Country == 6, "Germany",ifelse(
            data$Country == 7, "India",ifelse(
              data$Country == 8, "Italy",ifelse(
                data$Country == 9, "Mexico",ifelse(
                  data$Country == 10, "Nigeria",ifelse(
                    data$Country == 11, "Poland",ifelse(
                      data$Country == 12, "Russia",ifelse(
                        data$Country == 13, "South Africa",
                        ifelse(
                          data$Country == 14, "South Korea",ifelse(
                            data$Country == 15, "Singapore",
                            ifelse(
                              data$Country == 16, "Spain",
                              ifelse(
                                data$Country == 17, "Sweden",ifelse(
                                  data$Country == 18, "United Kingdom",ifelse(
                                    data$Country == 19, "United States","l"
))))))))))))))))))))

I was looking for the quickest way to convert any encoded variable into the respective Country name. Do you think is there a way to cope with this operation?

Thank you so much

I am not sure of the desired use. But maybe you can try to use a named vector. This is not the most elegant solution, though it solves the ifelse mess;)

An example of 4 countries. China = "4"

countrys <- c("Brazil", "Canada",
              "China",
              "Ecuador")
names(countrys) <- c(2:5)

# Test data.frame
data <- data.frame(country = 4)

# Now we can get the country directly from the data$country:
# Careful! 4 is not '4'
unname(countrys[as.character(data$country)])  

There are 2 options:

1: case_when from dplyr

library(dplyr)
data.frame(info = letters[1:5],
           country_id = 1:5) %>% 
  mutate(country_name = case_when(country_id == 1 ~ "Brazil",
                                  country_id == 2 ~ "Canada",
                                  country_id == 3 ~ "China",
                                  country_id == 4 ~ "Ecuador",
                                  country_id == 5 ~ "France",
                                  TRUE ~ "Unknown"))

  info country_id country_name
1    a          1       Brazil
2    b          2       Canada
3    c          3        China
4    d          4      Ecuador
5    e          5       France

2: merge or join the info from a country table:

# country table
countries <- data.frame(country_id = 1:5, 
                        country_name = c("Brazil", "Canada", "China", "Ecuador", "France"))

data.frame(info = letters[1:5],
           country_id = 1:5) %>% 
  left_join(countries, by = "country_id")

  info country_id country_name
1    a          1       Brazil
2    b          2       Canada
3    c          3        China
4    d          4      Ecuador
5    e          5       France

My preference would be 2, less coding and less chance of a mistake. You can keep the country table in your database or in a file somewhere and maintain that without needing to change the code.

This is a very nice case for a switch statement, which in my opinion makes for more readable code than dplyr::case_when or a series of ifelse , and is easily extendible, if for example there are further criteria like Region, Cities etc.

get_country <- Vectorize(function(x){
  switch(as.character(x),
         "1" = "Brazil", "2" = "Canada", "3" = "China",
         "4" = "Ecuador", "5" = "France", "6" = "Germany",
         "7" = "India", "8" = "Italy", "9" = "Mexico",
         "10" = "Nigeria", "11" = "Poland", "12" = "Russia",
         "13" = "South Africa", "14" = "South Korea", "15" = "Singapore",
         "16" = "Spain", "17" = "Sweden", "18" = "United Kingdom",
         "19" = "United States", NA
  )
})

data.frame(info = letters[1:5],
           country_id = 1:5) %>%
  mutate(country = get_country(country_id))

  info country_id  country
1    a          1  Brazil
2    b          2  Canada
3    c          3   China
4    d          4 Ecuador
5    e          5  France

But a long statement like that is a lot of work to type. Alternatively, a more dynamic approach, we can create a switch statement using a constructor function that takes vectors of input. Here I use the maps::iso3166 ( see this explanation ) data set to create expression of 269 countries. This extends to, for example, cities, regions et cetera.

constructor <- function(ids, names){
  purrr::imap_chr(as.character(ids), ~paste(paste0("\"", .x ,"\""),
                                            paste0("\"", names[.y], "\""),
                                            sep = "=")) %>%
    paste0(collapse = ", ") %>%
    paste0("Vectorize(function(x) switch(as.character(x), ", ., ", NA))", collapse = "") %>%
    str2expression()
}
get_country <- eval(constructor(1:149, trimws(rworldmap::countryExData$Country)))

set.seed(1)
data.frame(info = sample(letters, size = 5, replace = T),
           country_id = sample.int(149, 5, replace = T)) %>%
  mutate(country = get_country(country_id))

  info country_id           country
1    y        122      Sierra Leone
2    h         39           Algeria
3    l         42           Eritrea
4    y        134 Trinidad & Tobago
5    w         24             Chile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM