简体   繁体   中英

generate labels for variables in R

I'm searching for a better/faster way than this one to generate labels for a variable :

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"

df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c" 

For simplicity, I just have one variable in this example, of course there are more variables in my dataset but I just want to change a specific one.

You don't need sapply :

df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"

You could also produce the same result with factors:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))

# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"

# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))

# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)

# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")

# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE

You might also consider mapvalues or revalue in plyr , particularly if you're dealing with more labels:

df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values

Here is another fairly straightforward solution:

names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

It is even more straightforward if you include an NA in your "pick" object.

pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

If you use this second alternative, note that nomatch takes an integer value of the position of what you're matching agains. Here, nomatch maps to "NA" which is in the first position in your "pick" vector. If the "NA" were in the last position, you would enter it as nomatch = 6 instead.

You can also use ifelse function.

with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
 [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM