简体   繁体   中英

recode related values in an efficient way

I have a dataframe df with only one variable var with some related values.

df <- data.frame(var = c(rep('AUS',12), rep('NZ',12), rep('ENG',7), rep('SOC',12), 
                            rep('PAK',11), rep('SRI',17), rep('IND',15)))

df %>% count(var)
# # A tibble: 7 x 2
#      var     n
#   <fctr> <int>
# 1    AUS    12
# 2    ENG     7
# 3    IND    15
# 4     NZ    12
# 5    PAK    11
# 6    SOC    12
# 7    SRI    17

Based on some relations, some values should be recoded with a new value.

df %>% mutate(var = recode(var, 'AUS' = 'A', 'NZ' = 'A', 'ENG' = 'A', 
                           'SOC' = 'A', 'PAK' = 'B', 'SRI' = 'B')) %>% count(var)
# A tibble: 3 x 2
#      var     n
#   <fctr> <int>
# 1      A    43
# 2    IND    15
# 3      B    28

It can be seen that A and B recodes for 4 and 2 values respectively. I have also the expected solution in the question. However, is there any other efficient way to do this, instead of specifying the relations same number of times(4,2)??

One way to do this is to use a vector with named entries as a lookup table.

Codes = c(rep('A', 4), rep('B', 2), 'IND') 
names(Codes) = c('AUS', 'NZ', 'ENG', 'SOC', 'PAK', 'SRI', 'IND')
df$var = Codes[as.character(df$var)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM