I have a dataframe df
with only one variable var
with some related values.
df <- data.frame(var = c(rep('AUS',12), rep('NZ',12), rep('ENG',7), rep('SOC',12),
rep('PAK',11), rep('SRI',17), rep('IND',15)))
df %>% count(var)
# # A tibble: 7 x 2
# var n
# <fctr> <int>
# 1 AUS 12
# 2 ENG 7
# 3 IND 15
# 4 NZ 12
# 5 PAK 11
# 6 SOC 12
# 7 SRI 17
Based on some relations, some values should be recoded with a new value.
df %>% mutate(var = recode(var, 'AUS' = 'A', 'NZ' = 'A', 'ENG' = 'A',
'SOC' = 'A', 'PAK' = 'B', 'SRI' = 'B')) %>% count(var)
# A tibble: 3 x 2
# var n
# <fctr> <int>
# 1 A 43
# 2 IND 15
# 3 B 28
It can be seen that A
and B
recodes for 4 and 2 values respectively. I have also the expected solution in the question. However, is there any other efficient way to do this, instead of specifying the relations same number of times(4,2)??
One way to do this is to use a vector with named entries as a lookup table.
Codes = c(rep('A', 4), rep('B', 2), 'IND')
names(Codes) = c('AUS', 'NZ', 'ENG', 'SOC', 'PAK', 'SRI', 'IND')
df$var = Codes[as.character(df$var)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.