在R中删除多个文件中的NA列

Question

I have a large dataset and I used splitting to make the data more approachable. 我有一个很大的数据集，并且使用splitting使数据更易于访问。 I ended up with ~250 splits. 我最终进行了约250次拆分。 As a result, each split has different number of columns that are empty. 结果，每个拆分具有不同数量的空列。 I want to remove the empty columns and write the updated files. 我想删除空列并写入更新的文件。 I am able to do it manually, but as I mentioned, I have roughly 250 splits so I can't do it to all of them. 我可以手动执行此操作，但是正如我提到的，我大约有250个拆分，因此我无法对所有拆分进行拆分。

Below is a reproducible example: 下面是一个可重现的示例：

df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), 
                 Value= c(900, NA, 1300, 1100, NA),
                 Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
                 Num1 = c(2, NA, 3, 2, NA),
                 Num2 = c(2,3,3,1,2),
                 Rent= c('y', 'y', 'n', 'y', 'n'))

This is what I have so far. 到目前为止，这就是我所拥有的。

Splitting: 拆分：

index <- apply(is.na(df)*1, 1,paste, collapse = "")
s <- split(df, index)
split(df, index)
for (i in 1:length(s)) 
{write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")}

Removing empty columns: 删除空列：

split <- read.csv("Splits/3splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_3split.csv", row.names=FALSE)

split <- read.csv("Splits/2splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_2split.csv", row.names=FALSE)

split <- read.csv("Splits/1splits.csv")
updated_split <- split[,colSums(is.na(split))<nrow(split)]
write.csv(updated_split, file = "updated_1split.csv", row.names=FALSE)

Is there a way to automate the process above? 有没有办法使上述过程自动化？ What I mean by automate is to find a way to remove the empty columns in those three files without repeating the same three lines over and over again (doing it with 250 files isn't really an option). 我的自动化意思是找到一种方法来删除这三个文件中的空列，而不必一遍又一遍地重复相同的三行（实际上不是选择250个文件）。

Edit 1: 编辑1：

Like this? 像这样？

for (i in 1:length(s))
{
lapply(s, function(x) x[,colSums(is.na(x))<nrow(x)])
write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")
}

Answer 1

Maybe this: 也许这样：

df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), 
                 Value= c(900, NA, 1300, 1100, NA),
                 Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
                 Num1 = c(2, NA, 3, 2, NA),
                 Num2 = c(2,3,3,1,2),
                 Rent= c('y', 'y', 'n', 'y', 'n'))

index <- apply(is.na(df)*1, 1,paste, collapse = "")
s <- split(df, index)
split(df, index)
for (i in 1:length(s)) 
{
   write.csv(s[i], file = paste0("Splits/", i, "splits.csv"), row.names=FALSE, na = "")
   sdf <- data.frame(s[i])
   updated_split <- sdf[,colSums(is.na(sdf))<nrow(sdf)]
   write.csv(updated_split, file = paste0("updated", i, "split.csv"), row.names=FALSE)
}

在R中删除多个文件中的NA列

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-10-02 08:17:17

在R中删除多个文件中的NA列

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-10-02 08:17:17

解决方案1
2 已采纳 2015-10-02 08:17:17