[英]Creating a loop in R to clean multiple datasets
Suppose I have the following dataframes (essentially 4 similar datasets):假设我有以下数据框(基本上是 4 个类似的数据集):
q1_hosp <- data.frame(dxe1 = c(1,NULL, NA, NULL, 1), dxe2 = c(1,NULL, NULL, NULL, 1))
q2_hosp <- data.frame(dxe1 = c(NULL,1, 1, NA, 1), dxe2 = c(1,1, 1, NULL, 1))
q3_hosp <- data.frame(dxe1 = c(NA,NULL, 1 1, 1), dxe2 = c(1,NA, NA, NULL, 1))
q4_hosp <- data.frame(dxe1 = c(1,1, 1 NULL, 1), dxe2 = c(1,NULL, 1, 1, 1))
What I would like is to turn the NULL entries into missing values in R.我想要的是将 NULL 条目转换为 R 中的缺失值。 I can do them by writing the following codes in a tedious manner:我可以通过以乏味的方式编写以下代码来完成它们:
any(is.na(q1_hosp$dxe1)) # there are missing rows, keep NAs
q1_hosp$dxe1 <- na_if(q1_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q1_hosp$dxe2)) # there are missing rows, keep NAs
q1_hosp$dxe2 <- na_if(q1_hosp$dxe2, "NULL") #convert NULLto NA
any(is.na(q2_hosp$dxe1)) # there are missing rows, keep NAs
q2_hosp$dxe1 <- na_if(q2_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q2_hosp$dxe2)) # there are missing rows, keep NAs
q2_hosp$dxe2 <- na_if(q2_hosp$dxe2, "NULL") #convert NULLto NA
any(is.na(q3_hosp$dxe1)) # there are missing rows, keep NAs
q3_hosp$dxe1 <- na_if(q3_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q3_hosp$dxe2)) # there are missing rows, keep NAs
q3_hosp$dxe2 <- na_if(q3_hosp$dxe2, "NULL") #convert NULLto NA
any(is.na(q4_hosp$dxe1)) # there are missing rows, keep NAs
q4_hosp$dxe1 <- na_if(q4_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q4_hosp$dxe2)) # there are missing rows, keep NAs
q4_hosp$dxe2 <- na_if(q4_hosp$dxe2, "NULL") #convert NULLto NA
What I would like is to create a loop so that the process is less tedious.我想要的是创建一个循环,以便该过程不那么乏味。
I have been trying with this我一直在尝试这个
for(i in 1:4) {
##Clean the dxe variables
any(is.na(q[i]_hosp$dxe1)) # there are missing rows, keep NAs
q[i]_hosp$dxe1 <- na_if(q[i]_hosp$dxe1, "NULL") #convert NULLto NA
any(is.na(q[i]_hosp$dxe2)) # there are missing rows, keep NAs
q[i]_hosp$dxe1 <- na_if(q[i]_hosp$dxe2, "NULL") #convert NULLto NA
}
But my code doesn't work and I am getting errors.但是我的代码不起作用,并且出现错误。 Can anyone help write the for loop for what I want to achieve?任何人都可以帮助编写我想要实现的 for 循环吗?
Get the data in a list and use lapply
to change 'NULL'
to NA
from each of the dataframe.获取列表中的数据并使用lapply
将每个 dataframe 'NULL'
更改为NA
。
result <- lapply(mget(sprintf('q%d_hosp', 1:4)), function(x) {x[x == 'NULL'] <- NA;x})
If you want the changes in individual dataframes back.如果您希望恢复单个数据帧中的更改。
list2env(result, .GlobalEnv)
Try this.尝试这个。
dl
, which can access with get
function数据集的所有名称都存储在dl
中,可以通过get
function 访问RMNULL
function is used to replace NULL
by NA
RMNULL
function 用于将NULL
替换为NA
lapply
function is used to loop all datasets lapply
function 用于循环所有数据集# dataset name list
dl <- sprintf("%s%s%s","q",1:2,"_hosp")
df.list <- lapply(dl,get)
# replace NULL with NA
## remove NULL function
RMNULL <- function(df){
is.na(df) <- df == "NULL"
return(df)
}
lapply(df.list, RMNULL)
example data:示例数据:
q1_hosp <- structure(list(x = 1:3, y = list(NULL, NULL, NULL)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
q2_hosp <- structure(list(x = 4:6, y = list(1, NULL, NULL)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.