简体   繁体   English

如何在R中使用for循环从具有不同结构的多个数据帧中删除NA列?

[英]How to remove NA columns from multiple data frames having different structures using for loop in R?

I am having some trouble cleaning data that I imported from Excel with readxl. 我在清理使用readxl从Excel导入的数据时遇到了一些麻烦。 I am able to read each excel file and store individual sheets contained in that file into separate data frames. 我能够读取每个excel文件并将该文件中包含的各个工作表存储到单独的数据框中。

The problem is that each of these created data frames have many columns entirely filled with NAs which is because my code is storing these sheets into a list from which I am creating separate data frames and hence these NA columns are coming as a result of union of all columns contained in that Excel file. 问题在于,这些创建的数据帧中的每一个都有很多列,全部用NA填充,这是因为我的代码将这些工作表存储到一个列表中,从中我将创建单独的数据帧,因此,这些NA列是由于合并的结果而来的该Excel文件中包含的所有列。

I wish to automate this process of removing all NA columns from all the data frames having different number of rows and columns using a for loop. 我希望自动化此过程,使用for循环从具有不同行数和列数的所有数据帧中删除所有NA列。 However when I try this: 但是,当我尝试这样做:

    for(i in 1:length(AllFileSheetnames)){
      assign(AllFileSheetnames[i], function(x) x[, colSums(is.na(x)) < nrow(x)])
      print(AllFileSheetnames[i])
    }

puts all the values in a list again with a union of all the columns. 通过所有列的并集再次将所有值放入列表中。

I have imported 5 sheets from an excel file in R. All the sheets are stored into data frames D1, D2, D3, D4, and D5. 我已经从R中的excel文件中导入了5张纸。所有纸都存储在数据帧D1,D2,D3,D4和D5中。 The details of the original sheets are as follows: 原始工作表的详细信息如下:

D1: 99 Rows * 150 Columns; D1:99行* 150列;
D2: 99 Rows * 166 Columns; D2:99行* 166列;
D3: 99 Rows * 77 Columns; D3:99行* 77列;
D4: 99 Rows * 8 Columns; D4:99行* 8列;
D5: 99 Rows * 7 Columns D5:99行* 7列

When I import this file using readxl, it creates a list which contains 495 rows and 247 odd columns. 当我使用readxl导入此文件时,它将创建一个包含495行和247奇数列的列表。 I am successfully able to split the list into separate data frames for D1, D2 and so on, but each data frame has 247 columns and now I wish to automate the process of removing NA columns from each data frame to get the above mentioned dimensions for respective data frames. 我可以成功地将列表分为D1,D2等单独的数据帧,但是每个数据帧有247列,现在我希望自动化从每个数据帧中删除NA列的过程,以获得上述尺寸各个数据帧。

You might want to try this 您可能想尝试一下

## create two example data.frames (= reproducible example): 
 DF2 <- data.frame(x=1:3, y=c(1:2, NA), z=NA) 
(DF1 <- data.frame(x=1:3, y=NA, z=c(1:2, NA)) )
#   x  y  z
# 1 1 NA  1
# 2 2 NA  2
# 3 3 NA NA

## get objects named DFx and keep only cols, 
## where not *all* values are missing values (NA):
res <- lapply(mget(paste0("DF", 1:2)), 
       function(DF)
         DF[!sapply(DF, function(x) all(is.na(x)))]
)

## exploded the res-list into separate variables again:
invisible(list2env(res, globalenv())) # overwrites original DFs

## Inspect result: col y vanished in DF1
DF1
#   x  z
# 1 1  1
# 2 2  2
# 3 3 NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM