简体   繁体   English

R中的循环功能

[英]For Loop Function in R

I have been struggling to figure out why I am not returning the correct values to my data frame from my function. 我一直在努力弄清楚为什么我没有从函数将正确的值返回到数据框。 I want to loop through a vector of my data frame and create a new column by a calculation within the vector's elements. 我想遍历数据框的向量,并通过在向量的元素内进行计算来创建新列。 Here's what I have: 这是我所拥有的:

# x will be the data frame's vector
y <- function(x){
 new <- c()
 for (i in x){
  new <- c(new, x[i] - x[i+1])
 }
 return (new)
}

So here I want to create a new vector that returns the next element subtracted from current element. 所以在这里我想创建一个新的向量,该向量返回从当前元素减去的下一个元素。 Now, when I apply it to my data frame 现在,当我将其应用于数据框时

df$new <- lapply(df$I, y)

I get all NAs. 我得到所有NA。 I know I'm missing something completely obvious... 我知道我遗漏了一些显而易见的东西...

Also, how would I execute the function that resets itself if df$ID changes so I am not subtracting elements from two different df$IDs? 另外,如果df $ ID发生更改,我将如何执行重置自身的功能,这样我就不会从两个不同的df $ ID中减去元素? For example, my data frame will have 例如,我的数据框将具有

ID  I   Order   new
1001    5   1   1
1001    6   2   -2
1001    4   3   -2
1001    2   4   NA
1005    2   1   6
1005    8   2   0
1005    8   3   -2
1005    6   4   NA

Thanks! 谢谢!

Avoid the loop and use diff . 避免循环并使用diff Everything is vectorized here so it's easy. 一切都在这里进行矢量化处理,因此很容易。

df$new <- c(diff(df$I), NA)

But I don't understand your example result. 但我不理解您的示例结果。 Why are some 0 values changed to NA and some are not? 为什么有些0值更改为NA而有些却没有呢? And shouldn't 8-2 be 6 and not -6 ? 而且8-2不应该是6而不是-6吗? I think that needs to be clarified. 我认为需要澄清。

If the 0 values need to be changed to NA , just do the following after the above code. 如果需要将0值更改为NA ,则在上面的代码之后执行以下操作。

df$new[df$new == 0] <- NA

A one-liner of the complete process, that returns the new data frame, can be 可以将返回新数据帧的完整过程的一线内容设为

within(df, { new <- c(diff(I), NA); new[new == 0] <- NA })

Update : With respect to your comments below, my updated answer follows. 更新:关于您在下面的评论,下面是我的更新答案。

> M <- do.call(rbind, Map(function(x) { x$z <- c(diff(x$I), NA); x }, 
                          split(dat, dat$ID)))
> rownames(M) <- NULL
> M
    ID I Order  z
1 1001 5     1  1
2 1001 6     2 -2
3 1001 4     3 -2
4 1001 2     4 NA
5 1005 2     1  6
6 1005 8     2  0
7 1005 8     3 -2
8 1005 6     4 NA

Rather than a loop, you would be better off using a vector version of the math. 最好使用矢量版本的数学方法,而不是循环。 The exact indices will depend on what you want to do with the last value... (Note this line is not placed into your for loop, but just gives the result.) 确切的索引将取决于您要对最后一个值执行的操作...(请注意,此行未放入for循环中,而只是给出了结果。)

df$new = c(df$I[-1],NA) - df$I

Here you will be subtracting the original df$I from a shifted version that omits the first value [-1] and appends a NA at the end. 在这里,您将从省略第一个值[-1]并在末尾附加NA的移位版本中减去原始df$I

EDIT per comments: If you don't want to subtract across df$ID , you can blank out that subset of cells after subtraction: 编辑每个注释:如果您不想在df$ID ,则可以在减去后清空该单元格子集:

 df$new[df$ID != c(df$ID[-1],NA)] = NA

The dplyr library makes it very easy to do things separately for each level of a grouping variable, in your case ID . dplyr库使您很容易为您的案例ID中的分组变量的每个级别分别进行操作。 We can use diff as @Richard Scriven recommends, and use dplyr::mutate to add a new column. 我们可以按照@Richard Scriven的建议使用diff ,并使用dplyr::mutate添加新列。

> library(dplyr)
> df %>% group_by(ID) %>% mutate(new2 = c(diff(I), NA))
Source: local data frame [8 x 5]
Groups: ID

    ID I Order new new2
1 1001 5     1   1    1
2 1001 6     2  -2   -2
3 1001 4     3  -2   -2
4 1001 2     4  NA   NA
5 1005 2     1   6    6
6 1005 8     2   0    0
7 1005 8     3  -2   -2
8 1005 6     4  NA   NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM