[英]How to apply a function to each element of a data.frame?
I want to convert a numeric value into a factor, if the value is below -2 then "down" should be the factor, if it is above 2 then "up" and between "no_change": 我想将数值转换为因数,如果该值小于-2,则“ down”应为因数,如果该值大于2,则为“ up”,并介于“ no_change”之间:
So far I thought about creating a function: 到目前为止,我考虑过创建一个函数:
classifier <- function(x){
if (x >= 2){
return(as.factor("up"))
}else if (x <= -2){
return(as.factor("down"))
}else {
return(as.factor("no_change"))
}
}
I could make it iterate (with a for loop) over the input and return a list, so I could use it with apply. 我可以对输入进行迭代(使用for循环)并返回一个列表,因此可以将其与apply一起使用。
I want to apply this function to all cells of the data.frame, how can I do it? 我想将此功能应用于data.frame的所有单元格 ,该怎么办?
mock data ( runif(15, min=-5, max=5)
): 模拟数据(
runif(15, min=-5, max=5)
):
c(1.11004611710086, -1.86842617811635, 1.72159335808828, -2.68788822228089,
2.72551498375833, 3.67290901951492, -4.00984475389123, -2.39582793787122,
4.22395745059475, -0.360892189200968, 1.35027756914496, 2.89919016882777,
-0.158692332915962, -0.950306688901037, 3.39141107397154)
Using DF <- iris[-5]
as sample data, you can use cut
, as I suggested in the comments. 使用
DF <- iris[-5]
作为样本数据,您可以使用cut
,正如我在评论中所建议的。
Try: 尝试:
DF[] <- lapply(DF, cut, c(-Inf, -2, 2, Inf), c("down", "no_change", "up"))
head(DF)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 up up no_change no_change
## 2 up up no_change no_change
## 3 up up no_change no_change
## 4 up up no_change no_change
## 5 up up no_change no_change
## 6 up up no_change no_change
tail(DF)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 145 up up up up
## 146 up up up up
## 147 up up up no_change
## 148 up up up no_change
## 149 up up up up
## 150 up up up no_change
Or, with RHertel's "mock_data": 或者,使用RHertel的“ mock_data”:
cut(mock_data, c(-Inf, -2, 2, Inf), c("down", "no_change", "up"))
## [1] no_change no_change no_change down up up down
## [8] down up no_change no_change up no_change no_change
## [15] up
## Levels: down no_change up
Benchmarks 基准测试
As I suggested in the comments, RHertel's approach is likely to be the most efficient. 正如我在评论中所建议的那样,RHertel的方法可能是最有效的。 That approach uses pretty straightforward subsetting (which is fast) and
factor
(which is also generally fast). 该方法使用非常简单的子集(快速)和
factor
(通常也很快)。
On data the size you describe, you will definitely notice the difference: 在您描述的数据大小上,您一定会注意到不同之处:
set.seed(1)
nrow = 20000
ncol = 1000
x <- as.data.frame(matrix(runif(nrow * ncol, min=-5, max=5), ncol = ncol))
factorize <- function(invec) {
factorized <- rep("no_change", length(invec))
factorized[invec > 2] <- "up"
factorized[invec < -2] <- "down"
factor(factorized, c("down", "no_change", "up"))
}
RHfun <- function(indf = x) {
indf[] <- lapply(indf, factorize)
indf
}
AMfun <- function(DF = x) {
DF[] <- lapply(DF, cut, c(-Inf, -2, 2, Inf), c("down", "no_change", "up"))
DF
}
library(microbenchmark)
microbenchmark(AMfun(), RHfun(), times = 10)
# Unit: seconds
# expr min lq mean median uq max neval
# AMfun() 7.501814 8.015532 8.852863 8.731638 9.660191 10.198983 10
# RHfun() 1.437696 1.485791 1.723402 1.574507 1.637139 2.528574 10
I'm generally not fond of ifelse()
, so I'd probably introduce a new vector and treat the problem differently. 我通常不喜欢
ifelse()
,因此我可能会引入一个新的向量并以不同的方式对待该问题。
factorized <- rep("no_change", length(mock_data))
factorized[mock_data > 2] <- "up"
factorized[mock_data < -2] <- "down"
factorized <- as.factor(factorized)
#> factorized
#[1] no_change no_change no_change down up up down down up no_change no_change up no_change no_change up
#Levels: down no_change up
The data in this example is taken from the OP: 本示例中的数据取自OP:
mock_data <- c(1.11004611710086, -1.86842617811635, 1.72159335808828, -2.68788822228089,
2.72551498375833, 3.67290901951492, -4.00984475389123, -2.39582793787122,
4.22395745059475, -0.360892189200968, 1.35027756914496, 2.89919016882777,
-0.158692332915962, -0.950306688901037, 3.39141107397154)
Thanks to @docendo discimus for an improvement of this answer with a helpful comment. 感谢@docendo discimus对本答案的改进并提供了有用的评论。
Use apply
with identifier for rows and columns. 使用
apply
与标识符的行和列。
apply(yourDF, c(1, 2), classifier)
This is made for applying a function to every cell of a data.frame
. 这样做是为了将函数应用于
data.frame
每个单元。 It probably won't work on vectors. 它可能不适用于矢量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.