[英]R apply conversion to multiple columns of data.frame
我想將data.frame中的幾列從chr轉換為數值,我想在一行中完成。 這是我正在嘗試做的事情:
items[,2:4] <- as.numeric(sub("\\$","",items[,2:4]))
但是我收到一個錯誤消息:
Warning message:
NAs introduced by coercion
如果我按列做它,盡管它起作用:
items[,2:2] <- as.numeric(sub("\\$","",items[,2:2]))
items[,3:3] <- as.numeric(sub("\\$","",items[,3:3]))
items[,4:4] <- as.numeric(sub("\\$","",items[,4:4]))
我在這里想念什么? 為什么要為多列指定此命令? 這是我不知道的奇怪的R特質嗎?
示例數據:
Name, Cost1, Cost2, Cost3, Cost4
A, $10.00, $15.50, $13.20, $45.45
B, $45.23, $34.23, $34.24, $23.34
C, $23.43, $45.23, $65.23, $34.23
D, $76.34, $98.34, $90.34, $45.09
您的問題是, gsub
將其x
參數轉換為character
。 如果一個list
( data.frame
實際上是一個list
)被轉換為character
則會發生有線連接:
as.character(list(a=c("1", "1"), b="1"))
# "c(\"1\", \"1\")" "1"
# and "c(\"1\", \"1\")" can not convert into a numeric
as.numeric("c(\"1\", \"1\")")
# NA
一種解決方案是unlist
x
參數:
items[, 2:5] <- as.numeric(gsub("\\$", "", unlist(items[, 2:5])))
是的,這里有: apply
是您要查找的命令:
items<-read.table(text="Name, Cost1, Cost2, Cost3, Cost4
A, $10.00, $15.50, $13.20, $45.45
B, $45.23, $34.23, $34.24, $23.34
C, $23.43, $45.23, $65.23, $34.23
D, $76.34, $98.34, $90.34, $45.09", header=TRUE,sep=",")
items[,2:4]<-apply(items[,2:4],2,function(x){as.numeric(gsub("\\$","",x))})
items
Name Cost1 Cost2 Cost3 Cost4
1 A 10.00 15.50 13.20 $45.45
2 B 45.23 34.23 34.24 $23.34
3 C 23.43 45.23 65.23 $34.23
4 D 76.34 98.34 90.34 $45.09
一種更有效的方法是:
items[-1] <- lapply(items[-1], function(x) as.numeric(gsub("$", "", x, fixed = TRUE)))
items
# Name Cost1 Cost2 Cost3 Cost4
# 1 A 10.00 15.50 13.20 45.45
# 2 B 45.23 34.23 34.24 23.34
# 3 C 23.43 45.23 65.23 34.23
# 4 D 76.34 98.34 90.34 45.09
fun1 <- function() {
A[-1] <- lapply(A[-1], function(x) as.numeric(gsub("$", "", x, fixed=TRUE)))
A
}
fun2 <- function() {
A[, 2:ncol(A)] <- as.numeric(gsub("\\$", "", unlist(A[, 2:ncol(A)])))
A
}
fun3 <- function() {
A[, 2:ncol(A)] <- apply(A[,2:ncol(A)], 2, function(x) { as.numeric(gsub("\\$","",x)) })
A
}
這是一些示例數據和處理時間
set.seed(1)
A <- data.frame(Name = sample(LETTERS, 10000, TRUE),
matrix(paste0("$", sample(99, 10000*100, TRUE)),
ncol = 100))
system.time(fun1())
# user system elapsed
# 0.72 0.00 0.72
system.time(fun2())
# user system elapsed
# 5.84 0.00 5.85
system.time(fun3())
# user system elapsed
# 4.14 0.00 4.14
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.