R：在数据帧列表上使用for循环

Question

I have a loop that recodes values of a column and breaks when a condition is met. 我有一个循环，用于重新编码列的值，并在满足条件时中断。 I would like to use this loop, or its basic concept, on a list of data frames with the same format. 我想在具有相同格式的数据帧列表上使用此循环或其基本概念。

sample data: 样本数据：

Id <- as.factor(c(rep("01001", 11), rep("01043", 11), rep("01065", 11), rep("01069", 11)))
YearCode <- as.numeric(rep(1:11, 4))
Type <- c(NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA)
test <- NA
sample_df <- data.frame(Id, YearCode, Type, test)

# A part of sample_df
one_df <- subset(sample_df, sample_df$Id=="01069")

This for loop works fine for one data frame: 对于一个数据帧，此for循环工作正常：

# example for loop using example data frame "one_df"
for(i in seq(along=one_df$Id)){
if(is.na(one_df$Type[i])){  # if Type is NA, recode to 0
one_df$test[i] <- 0  
} else {   # Stop when Type is not NA, and leave remaining NAs that come after
break }
}

However, I have many data frames with this same format in a list. 但是，列表中有许多具有相同格式的数据框。 I would like to keep them in the list and apply this loop over the whole list. 我想将它们保留在列表中，并将此循环应用于整个列表。

# example list : split data frame into list by Id
sample_list <- split(sample_df, sample_df$Id, drop = TRUE)

I've looked around other posts such as this one , but I get stuck when trying to loop over each data frame in the list or write a similar function using lapply. 我已经看过其他文章，例如这篇文章，但是当我尝试遍历列表中的每个数据帧或使用lapply编写类似的函数时，我陷入了困境。 How can I modify this loop to work on the list (sample_list), using either a for loop, lapply, or something else? 如何使用for循环，lapply或其他方式修改此循环以使其在列表（sample_list）上工作？

Any tips would be greatly appreciated, let me know if I need to clarify anything. 任何提示将不胜感激，让我知道是否需要澄清。 Thanks! 谢谢！

Answer 1

I think the following would do the job that you described. 我认为以下将完成您描述的工作。 What I did is the following. 我所做的如下。 I first created a new column called test with if_else() . 我首先使用if_else()创建了一个名为test的新列。 If complete.cases(Type) is TRUE, then use a value from Type . 如果complete.cases（Type）为TRUE，则使用Type的值。 Otherwise use 0. The next step was to replace some specific 0s with NA. 否则使用0。下一步是用NA替换某些特定的0。 Since you do not want to have 0s in rows which come after the row with the first numeric value in Type . 由于您不希望在Type的第一个数值之后的行中有0。 For instance, you do not want to have 0s after the 10th row for Id == 01069. So I created the testing condition: row_number() > which(complete.cases(Type))[1] . 例如，对于ID == 01069，您不想在第10行之后有0。因此，我创建了测试条件： row_number() > which(complete.cases(Type))[1] 。 You can read this as "whether a row number is larger than the row number for the first numeric value." 您可以将其理解为“行号是否大于第一个数字值的行号”。 Using this condition, I replaced 0s with NA. 使用此条件，我用NA替换了0。 I provided a part of the result for sample_df . 我为sample_df提供了一部分结果。 I hope this will help your work. 希望对您有所帮助。

library(dplyr)

sample_df %>%
group_by(Id) %>%
mutate(test = if_else(complete.cases(Type), Type, 0),
       test = if_else(row_number() > which(complete.cases(Type))[1],
                      NA_real_, test)) -> out

#       Id YearCode  Type  test
#   <fctr>    <dbl> <dbl> <dbl>
#1   01001        1    NA     0
#2   01001        2    NA     0
#3   01001        3    NA     0
#4   01001        4    NA     0
#5   01001        5    NA     0
#6   01001        6    NA     0
#7   01001        7    NA     0
#8   01001        8     2     2
#9   01001        9    NA    NA
#10  01001       10    NA    NA
#11  01001       11    NA    NA
#------------------------------
#34  01069        1    NA     0
#35  01069        2    NA     0
#36  01069        3    NA     0
#37  01069        4    NA     0
#38  01069        5    NA     0
#39  01069        6    NA     0
#40  01069        7    NA     0
#41  01069        8    NA     0
#42  01069        9    NA     0
#43  01069       10     2     2
#44  01069       11    NA    NA

EDIT 编辑

The OP wants to have 0 when Type contains NAs only, according to his/her comment. 根据他/她的评论，当Type仅包含NA时，OP希望为0。 The following will do the job. 以下将完成工作。

sample_df %>%
group_by(Id) %>%
mutate(test = if_else(complete.cases(Type), Type, 0),
       test = if_else(row_number() > which(complete.cases(Type))[1],
                      NA_real_, test),
       foo = sum(Type, na.rm = TRUE),
       test = replace(test, which(foo == 0), 0)) %>%
select(-foo) -> out

# A part of the result
#       Id YearCode  Type  test
#   <fctr>    <dbl> <dbl> <dbl>
#1   01001        1    NA     0
#2   01001        2    NA     0
#3   01001        3    NA     0
#4   01001        4    NA     0
#5   01001        5    NA     0
#6   01001        6    NA     0
#7   01001        7    NA     0
#8   01001        8     2     2
#9   01001        9    NA    NA
#10  01001       10    NA    NA
#11  01001       11    NA    NA
#12  01043        1    NA     0
#13  01043        2    NA     0
#14  01043        3    NA     0
#15  01043        4    NA     0
#16  01043        5    NA     0
#17  01043        6    NA     0
#18  01043        7    NA     0
#19  01043        8    NA     0
#20  01043        9    NA     0
#21  01043       10    NA     0
#22  01043       11    NA     0

Answer 2

IS there an issue with creating a function and using lapply? 创建函数和使用lapply是否存在问题？ it seems to work 它似乎有效

#rm(list=ls())
Id <- as.factor(c(rep("01001", 11), rep("01043", 11), rep("01065", 11), rep("01069", 11)))
YearCode <- as.numeric(rep(1:11, 4))
Type <- c(NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA)
test <- NA
sample_df <- data.frame(Id, YearCode, Type, test)

# A part of sample_df
one_df <- subset(sample_df, sample_df$Id=="01069")

sample_list <- split(sample_df, sample_df$Id, drop = TRUE)

####################################

# for loop as funciton   
fnX<- function(myDF){
 for(i in seq(along=myDF$Id)){
   if(is.na(myDF$Type[i])){  # if Type is NA, recode to 0
    myDF$test[i] <- 0  
   } else {   # Stop and leave remaining NAs that come after
   break }
  } 
  myDF
 }

#apply function 
fnX(sample_list$`01069`)   

lapply(sample_list,fnX)

R：在数据帧列表上使用for循环

问题描述

2 个解决方案

解决方案1
2 2016-12-02 03:19:23

解决方案2
0 已采纳 2016-12-02 04:10:59

R：在数据帧列表上使用for循环

问题描述

2 个解决方案

解决方案1 2 2016-12-02 03:19:23

解决方案2 0 已采纳 2016-12-02 04:10:59

解决方案1
2 2016-12-02 03:19:23

解决方案2
0 已采纳 2016-12-02 04:10:59