简体   繁体   中英

R apply conversion to multiple columns of data.frame

I am wanting to convert several columns in a data.frame from chr to numeric and I would like to do it in a single line. Here is what I am trying to do:

items[,2:4] <- as.numeric(sub("\\$","",items[,2:4]))

But I get an error saying:

Warning message:
NAs introduced by coercion

If I do it column by column though it works:

items[,2:2] <- as.numeric(sub("\\$","",items[,2:2]))
items[,3:3] <- as.numeric(sub("\\$","",items[,3:3]))
items[,4:4] <- as.numeric(sub("\\$","",items[,4:4]))

What am I missing here? Why I specify this command for multiple columns? Is this some odd R idiosyncrasy that I am not aware of?

Example Data:

Name, Cost1,  Cost2,  Cost3,  Cost4
A,    $10.00, $15.50, $13.20, $45.45
B,    $45.23, $34.23, $34.24, $23.34
C,    $23.43, $45.23, $65.23, $34.23
D,    $76.34, $98.34, $90.34, $45.09

Your problem is, that gsub converts its x argument to character . If a list (a data.frame is in fact a list ) is converted to character something wired happen:

as.character(list(a=c("1", "1"), b="1"))
# "c(\"1\", \"1\")" "1"

# and "c(\"1\", \"1\")" can not convert into a numeric
as.numeric("c(\"1\", \"1\")")
# NA

A one line solution would be to unlist the x argument:

items[, 2:5] <- as.numeric(gsub("\\$", "", unlist(items[, 2:5])))

Yes there is: apply is the command you are looking for:

items<-read.table(text="Name, Cost1,  Cost2,  Cost3,  Cost4
A,    $10.00, $15.50, $13.20, $45.45
B,    $45.23, $34.23, $34.24, $23.34
C,    $23.43, $45.23, $65.23, $34.23
D,    $76.34, $98.34, $90.34, $45.09", header=TRUE,sep=",")

items[,2:4]<-apply(items[,2:4],2,function(x){as.numeric(gsub("\\$","",x))})
items
  Name Cost1 Cost2 Cost3   Cost4
1    A 10.00 15.50 13.20  $45.45
2    B 45.23 34.23 34.24  $23.34
3    C 23.43 45.23 65.23  $34.23
4    D 76.34 98.34 90.34  $45.09

A more efficient approach would be:

items[-1] <- lapply(items[-1], function(x) as.numeric(gsub("$", "", x, fixed = TRUE)))
items
#   Name Cost1 Cost2 Cost3 Cost4
# 1    A 10.00 15.50 13.20 45.45
# 2    B 45.23 34.23 34.24 23.34
# 3    C 23.43 45.23 65.23 34.23
# 4    D 76.34 98.34 90.34 45.09

Some benchmarks of the answers so far

fun1 <- function() {
  A[-1] <- lapply(A[-1], function(x) as.numeric(gsub("$", "", x, fixed=TRUE)))
  A
}
fun2 <- function() {
  A[, 2:ncol(A)] <- as.numeric(gsub("\\$", "", unlist(A[, 2:ncol(A)])))
  A
}
fun3 <- function() {
  A[, 2:ncol(A)] <- apply(A[,2:ncol(A)], 2, function(x) { as.numeric(gsub("\\$","",x)) })
  A
}

Here's some sample data and processing times

set.seed(1)
A <- data.frame(Name = sample(LETTERS, 10000, TRUE),
                matrix(paste0("$", sample(99, 10000*100, TRUE)), 
                       ncol = 100))
system.time(fun1())
#    user  system elapsed 
#    0.72    0.00    0.72 
system.time(fun2())
#    user  system elapsed 
#    5.84    0.00    5.85 
system.time(fun3())
#    user  system elapsed 
#    4.14    0.00    4.14 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM