简体   繁体   English

重新编码R中的值

[英]Recode values in R

I want to recode the values in a column if x is >1 but < 2, it will be recoded as 1 如果x> 1但<2,我想重新编码一列中的值,它将重新编码为1

Here's my code: 这是我的代码:

neu$b <- lapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))

Is there sth wrong? 那里有错吗?

 swl.y

  2.2
  1.2
  3.4
  5.6

I need to recode all the values actually: 我实际上需要重新编码所有值:

  neu$c <- with(neu, ifelse(swl.y>1 & swl.y <=2, 1, swl.y))
  neu$c <- with(neu, ifelse(swl.y>2 & swl.y <=3, 2, swl.y))
  neu$c <- with(neu, ifelse(swl.y>3 & swl.y <=4, 3, swl.y))
  neu$c <- with(neu, ifelse(swl.y>4 & swl.y <=5, 4, swl.y))
  neu$c <- with(neu, ifelse(swl.y>5 & swl.y <=6, 5, swl.y))
  neu$c <- with(neu, ifelse(swl.y>6 & swl.y <=7, 6, swl.y))

I think I know where the problem is. 我想我知道问题出在哪里。 When R runs the second line of code, the recoded values were back to the previous values. 当R运行第二行代码时,重新编码的值恢复为先前的值。

We don't need to loop for a single column. 我们不需要为单个列循环。 By using lapply(neu$swl.y , we are getting each element of the column as the list element, which we may not need. The function ifelse is vectorized and can be used directly on the column 'swl.y' with the logical condition mentioned in the OP's post. 通过使用lapply(neu$swl.y ,我们可以将列中的每个元素用作list元素,而我们可能不需要这些元素ifelse函数是矢量化的,可以通过逻辑将其直接用于列“ swl.y” OP的帖子中提到的情况。

 neu$b <- with(neu, ifelse(swl.y>1 & swl.y <=2, 1, swl.y))

Or otherwise, we create 'b' column as 'swl.y' and change the values of 'b' based on the logical condition. 否则,我们将“ b”列创建为“ swl.y”,并根据逻辑条件更改“ b”的值。

 neu$b <- neu$swl.y
 neu$b[with(neu, swl.y>1 & swl.y <=2)] <- 1

To better understand the problem with the OP's code, we can check the output from the lapply 为了更好地理解OP代码的问题,我们可以检查lapply的输出

 lapply(neu$swl.y, function(x) x) #similar to `as.list(neu$swl.y)`
 #[[1]]
 #[1] 3

 #[[2]]
 #[1] 0

 #[[3]]
 #[1] 0

 #[[4]]
 #[1] 2

 #[[5]]
 #[1] 1

The output is a list with each element of the column as list elements. 输出是一个list ,其中列的每个元素都作为list元素。 Using ifelse on a list may not be optimum as it is vectorized (already mentioned above). 在列表上使用ifelse可能不是最佳方法,因为它已经过矢量化处理(如上所述)。 But, suppose if we do with ifelse 但是,假设我们是否使用ifelse

lapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
#[[1]]
#[1] 3

#[[2]]
#[1] 0

#[[3]]
#[1] 0

#[[4]]
#[1] 1

#[[5]]
#[1] 1

A data.frame can be considered as a list with list elements that are having the same length. 可以将data.frame视为具有相同长度列表元素的list So, based on the above output, this should be a data.frame with 5 columns and 1 row. 因此,基于上面的输出,这应该是一个5列1行的data.frame。 By assinging to a single column 'b', we are instead creating a list column with 5 list elements. 通过单列“ b”,我们创建了一个包含5个列表元素的list列。

 neu$b <- lapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
 str(neu)
 #'data.frame': 5 obs. of  2 variables:
 #$ swl.y: int  3 0 0 2 1
 #$ b    :List of 5
 # ..$ : int 3
 # ..$ : int 0
 # ..$ : int 0
 # ..$ : num 1
 # ..$ : int 1

But, this is not we wanted. 但是,这不是我们想要的。 What is the remedy? 有什么补救办法? One way is using sapply/vapply instead of lapply which returns a vector output as the lengths are the same or we unlist the lapply output to create a vector 一种方法是使用sapply/vapply而不是lapply ,因为长度相同, lapply将返回vector输出,或者我们不unlist lapply输出以创建vector

 neu$b <- sapply(neu$swl.y, function(x) ifelse(x>1 & x<=2, 1, x))
 str(neu) 
 #'data.frame': 5 obs. of  2 variables:
 # $ swl.y: int  3 0 0 2 1
 # $ b    : num  3 0 0 1 1

Update 更新资料

Based on the OP's edited post, if we need multiple recodes, use either cut or findInterval . 根据OP的编辑过的帖子,如果我们需要多次重新编码,请使用cutfindInterval In the cut , we can specify the breaks and there are other arguments labels to return the default label or not. cut ,我们可以指定breaks ,还有其他参数labels可以返回默认标签。

 with(neu1, cut(swl.y, breaks=c(-Inf,1,2,3,4,5,6,Inf), labels=F)-1)
 #[1] 2 1 3 5

data 数据

set.seed(48)
neu <- data.frame(swl.y=sample(0:5, 5, replace=TRUE))

#newdata 
neu1 <- structure(list(swl.y = c(2.2, 1.2, 3.4, 5.6)), 
.Names = "swl.y", class = "data.frame", row.names = c(NA, -4L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM