[英]Recode values in R
I want to recode the values in a column if x is >1 but < 2, it will be recoded as 1 如果x> 1但<2,我想重新编码一列中的值,它将重新编码为1
Here's my code: 这是我的代码:
neu$b <- lapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
Is there sth wrong? 那里有错吗?
swl.y
2.2
1.2
3.4
5.6
I need to recode all the values actually: 我实际上需要重新编码所有值:
neu$c <- with(neu, ifelse(swl.y>1 & swl.y <=2, 1, swl.y))
neu$c <- with(neu, ifelse(swl.y>2 & swl.y <=3, 2, swl.y))
neu$c <- with(neu, ifelse(swl.y>3 & swl.y <=4, 3, swl.y))
neu$c <- with(neu, ifelse(swl.y>4 & swl.y <=5, 4, swl.y))
neu$c <- with(neu, ifelse(swl.y>5 & swl.y <=6, 5, swl.y))
neu$c <- with(neu, ifelse(swl.y>6 & swl.y <=7, 6, swl.y))
I think I know where the problem is. 我想我知道问题出在哪里。 When R runs the second line of code, the recoded values were back to the previous values. 当R运行第二行代码时,重新编码的值恢复为先前的值。
We don't need to loop for a single column. 我们不需要为单个列循环。 By using lapply(neu$swl.y
, we are getting each element of the column as the list
element, which we may not need. The function ifelse
is vectorized and can be used directly on the column 'swl.y' with the logical condition mentioned in the OP's post. 通过使用lapply(neu$swl.y
,我们可以将列中的每个元素用作list
元素,而我们可能不需要这些元素ifelse
函数是矢量化的,可以通过逻辑将其直接用于列“ swl.y” OP的帖子中提到的情况。
neu$b <- with(neu, ifelse(swl.y>1 & swl.y <=2, 1, swl.y))
Or otherwise, we create 'b' column as 'swl.y' and change the values of 'b' based on the logical condition. 否则,我们将“ b”列创建为“ swl.y”,并根据逻辑条件更改“ b”的值。
neu$b <- neu$swl.y
neu$b[with(neu, swl.y>1 & swl.y <=2)] <- 1
To better understand the problem with the OP's code, we can check the output from the lapply
为了更好地理解OP代码的问题,我们可以检查lapply
的输出
lapply(neu$swl.y, function(x) x) #similar to `as.list(neu$swl.y)`
#[[1]]
#[1] 3
#[[2]]
#[1] 0
#[[3]]
#[1] 0
#[[4]]
#[1] 2
#[[5]]
#[1] 1
The output is a list
with each element of the column as list
elements. 输出是一个list
,其中列的每个元素都作为list
元素。 Using ifelse
on a list may not be optimum as it is vectorized (already mentioned above). 在列表上使用ifelse
可能不是最佳方法,因为它已经过矢量化处理(如上所述)。 But, suppose if we do with ifelse
但是,假设我们是否使用ifelse
lapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
#[[1]]
#[1] 3
#[[2]]
#[1] 0
#[[3]]
#[1] 0
#[[4]]
#[1] 1
#[[5]]
#[1] 1
A data.frame
can be considered as a list
with list elements that are having the same length. 可以将data.frame
视为具有相同长度列表元素的list
。 So, based on the above output, this should be a data.frame with 5 columns and 1 row. 因此,基于上面的输出,这应该是一个5列1行的data.frame。 By assinging to a single column 'b', we are instead creating a list
column with 5 list elements. 通过单列“ b”,我们创建了一个包含5个列表元素的list
列。
neu$b <- lapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
str(neu)
#'data.frame': 5 obs. of 2 variables:
#$ swl.y: int 3 0 0 2 1
#$ b :List of 5
# ..$ : int 3
# ..$ : int 0
# ..$ : int 0
# ..$ : num 1
# ..$ : int 1
But, this is not we wanted. 但是,这不是我们想要的。 What is the remedy? 有什么补救办法? One way is using sapply/vapply
instead of lapply
which returns a vector
output as the lengths are the same or we unlist
the lapply
output to create a vector
一种方法是使用sapply/vapply
而不是lapply
,因为长度相同, lapply
将返回vector
输出,或者我们不unlist
lapply
输出以创建vector
neu$b <- sapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
str(neu)
#'data.frame': 5 obs. of 2 variables:
# $ swl.y: int 3 0 0 2 1
# $ b : num 3 0 0 1 1
Based on the OP's edited post, if we need multiple recodes, use either cut
or findInterval
. 根据OP的编辑过的帖子,如果我们需要多次重新编码,请使用cut
或findInterval
。 In the cut
, we can specify the breaks
and there are other arguments labels
to return the default label or not. 在cut
,我们可以指定breaks
,还有其他参数labels
可以返回默认标签。
with(neu1, cut(swl.y, breaks=c(-Inf,1,2,3,4,5,6,Inf), labels=F)-1)
#[1] 2 1 3 5
set.seed(48)
neu <- data.frame(swl.y=sample(0:5, 5, replace=TRUE))
#newdata
neu1 <- structure(list(swl.y = c(2.2, 1.2, 3.4, 5.6)),
.Names = "swl.y", class = "data.frame", row.names = c(NA, -4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.