简体   繁体   English

在 R 中创建循环以清理多个数据集

[英]Creating a loop in R to clean multiple datasets

Suppose I have the following dataframes (essentially 4 similar datasets):假设我有以下数据框(基本上是 4 个类似的数据集):

q1_hosp <- data.frame(dxe1 = c(1,NULL, NA, NULL, 1), dxe2 = c(1,NULL, NULL, NULL, 1))
q2_hosp <- data.frame(dxe1 = c(NULL,1, 1, NA, 1), dxe2 = c(1,1, 1, NULL, 1))
q3_hosp <- data.frame(dxe1 = c(NA,NULL, 1 1, 1), dxe2 = c(1,NA, NA, NULL, 1))
q4_hosp <- data.frame(dxe1 = c(1,1, 1 NULL, 1), dxe2 = c(1,NULL, 1, 1, 1))

What I would like is to turn the NULL entries into missing values in R.我想要的是将 NULL 条目转换为 R 中的缺失值。 I can do them by writing the following codes in a tedious manner:我可以通过以乏味的方式编写以下代码来完成它们:

any(is.na(q1_hosp$dxe1)) # there are missing rows, keep NAs
q1_hosp$dxe1 <- na_if(q1_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q1_hosp$dxe2)) # there are missing rows, keep NAs
q1_hosp$dxe2 <- na_if(q1_hosp$dxe2, "NULL") #convert NULLto NA

any(is.na(q2_hosp$dxe1)) # there are missing rows, keep NAs
q2_hosp$dxe1 <- na_if(q2_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q2_hosp$dxe2)) # there are missing rows, keep NAs
q2_hosp$dxe2 <- na_if(q2_hosp$dxe2, "NULL") #convert NULLto NA

any(is.na(q3_hosp$dxe1)) # there are missing rows, keep NAs
q3_hosp$dxe1 <- na_if(q3_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q3_hosp$dxe2)) # there are missing rows, keep NAs
q3_hosp$dxe2 <- na_if(q3_hosp$dxe2, "NULL") #convert NULLto NA

any(is.na(q4_hosp$dxe1)) # there are missing rows, keep NAs
q4_hosp$dxe1 <- na_if(q4_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q4_hosp$dxe2)) # there are missing rows, keep NAs
q4_hosp$dxe2 <- na_if(q4_hosp$dxe2, "NULL") #convert NULLto NA

What I would like is to create a loop so that the process is less tedious.我想要的是创建一个循环,以便该过程不那么乏味。

I have been trying with this我一直在尝试这个

  for(i in 1:4) {
  ##Clean the dxe variables
  any(is.na(q[i]_hosp$dxe1)) # there are missing rows, keep NAs
  q[i]_hosp$dxe1 <- na_if(q[i]_hosp$dxe1, "NULL") #convert NULLto NA
 any(is.na(q[i]_hosp$dxe2)) # there are missing rows, keep NAs
  q[i]_hosp$dxe1 <- na_if(q[i]_hosp$dxe2, "NULL") #convert NULLto NA
}

But my code doesn't work and I am getting errors.但是我的代码不起作用,并且出现错误。 Can anyone help write the for loop for what I want to achieve?任何人都可以帮助编写我想要实现的 for 循环吗?

Get the data in a list and use lapply to change 'NULL' to NA from each of the dataframe.获取列表中的数据并使用lapply将每个 dataframe 'NULL'更改为NA

result <- lapply(mget(sprintf('q%d_hosp', 1:4)), function(x) {x[x == 'NULL'] <- NA;x})

If you want the changes in individual dataframes back.如果您希望恢复单个数据帧中的更改。

list2env(result, .GlobalEnv)

Try this.尝试这个。

  1. all names of dataset are stored in dl , which can access with get function数据集的所有名称都存储在dl中,可以通过get function 访问
  2. RMNULL function is used to replace NULL by NA RMNULL function 用于将NULL替换为NA
  3. lapply function is used to loop all datasets lapply function 用于循环所有数据集
# dataset name list 
dl <- sprintf("%s%s%s","q",1:2,"_hosp")
df.list <- lapply(dl,get)

# replace NULL with NA
## remove NULL function
RMNULL <- function(df){
    is.na(df) <- df == "NULL"
    return(df)
}
lapply(df.list, RMNULL)

example data:示例数据:

q1_hosp <- structure(list(x = 1:3, y = list(NULL, NULL, NULL)), .Names = c("x",  "y"), row.names = c(NA, -3L), class = "data.frame")

q2_hosp <- structure(list(x = 4:6, y = list(1, NULL, NULL)), .Names = c("x",  "y"), row.names = c(NA, -3L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM