generate labels for variables in R

Question

I'm searching for a better/faster way than this one to generate labels for a variable :

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"

df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

For simplicity, I just have one variable in this example, of course there are more variables in my dataset but I just want to change a specific one.

Answer 1

You don't need sapply :

df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"

You could also produce the same result with factors:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))

# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"

# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))

# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)

# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")

# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE

Answer 2

You might also consider mapvalues or revalue in plyr , particularly if you're dealing with more labels:

df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values

Answer 3

Here is another fairly straightforward solution:

names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

It is even more straightforward if you include an NA in your "pick" object.

pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

If you use this second alternative, note that nomatch takes an integer value of the position of what you're matching agains. Here, nomatch maps to "NA" which is in the first position in your "pick" vector. If the "NA" were in the last position, you would enter it as nomatch = 6 instead.

Answer 4

You can also use ifelse function.

with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
 [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

generate labels for variables in R

Question

4 answers

solution1
2 ACCPTED 2013-08-13 12:38:36

solution2
2 2013-08-13 12:48:12

solution3
2 2013-08-13 16:46:38

solution4
0 2013-08-13 13:27:31

generate labels for variables in R

Question

4 answers

solution1 2 ACCPTED 2013-08-13 12:38:36

solution2 2 2013-08-13 12:48:12

solution3 2 2013-08-13 16:46:38

solution4 0 2013-08-13 13:27:31

solution1
2 ACCPTED 2013-08-13 12:38:36

solution2
2 2013-08-13 12:48:12

solution3
2 2013-08-13 16:46:38

solution4
0 2013-08-13 13:27:31