根据r中的NA值对实例进行分组

Question

I am reading a csv file and unfortunately my dataframe has many missing values. 我正在读取一个csv文件，不幸的是我的数据框缺少许多值。 A small snip is as following: 一个小片段如下：

数据帧

df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), 
                 Value= c(900, NA, 1300, 1100, NA),
                 Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
                 Num1 = c(2, NA, 3, 2, NA),
                 Num2 = c(2,3,3,1,2),
                 Rent= c('y', 'y', 'n', 'y', 'n'))

I want to predict some of the results using weka but I can't do it if I have multiple attributes missing. 我想使用weka预测一些结果，但是如果缺少多个属性，则无法做到。 I know that I should be using the function is.na but I am not sure in what way it can be done because so far I used it only for summing and counting. 我知道我应该使用is.na函数，但是我不确定可以用什么方式完成，因为到目前为止，我仅将其用于求和和计数。

Edit: For an example, in this file I have missing values at 4 out of the 5 instances. 编辑：例如，在此文件中，我缺少5个实例中的4个值。 Instances 2 and 5 share the same missing attributes (B and D), while instances 1 and 4 share the same missing value as well (C). 实例2和实例5共享相同的缺失属性（B和D），而实例1和实例4也共享相同的缺失值（C）。 What I'd like to get is a dataframe that consists out of those instances so I could export them into files and run analysis on those files individually. 我想要得到的是一个由这些实例组成的数据框，因此我可以将其导出到文件中并分别对这些文件进行分析。 An example of an output could be 输出的示例可能是

> A

> B

Edit 2: 编辑2：

I want to save the splits and so far I tried this: 我想保存拆分，到目前为止，我尝试了以下操作：

write.csv(split(temp, index), file = "C:/Users/Nikita/Desktop/splits.csv", row.names=FALSE)

But it writes all the splits in one line. 但是它将所有拆分写入一行。 Is there a way to separate them by a line? 有没有办法用一条线将它们分开？

Edit 3: 编辑3：

My steps are: 我的步骤是：

data <- read.csv("location")
index <- apply(is.na(data)*1, 1,paste, collapse = "")
s <- split(data, index)
lapply(s, function(x) {names(x) <- names(data);x})
big.data <- do.call(rbind, s)
write.csv(big.data, file = "location", row.names=FALSE)

Am I missing something? 我想念什么吗？

Answer 1

df[!is.na(df$Value), ]
  Size Value Location Num1 Num2 Rent
1  800   900     <NA>    2    2    y
3 1100  1300   uptown    3    3    n
4 1200  1100     <NA>    2    1    y

And 和

df[is.na(df$Value), ]
  Size Value Location Num1 Num2 Rent
2  850    NA  midcity   NA    3    y
5 1000    NA Lakeview   NA    2    n

In the future, please create a reproducible example so that users do not have to create a data frame by hand from your question. 将来，请创建一个可复制的示例，以使用户不必从您的问题中手动创建数据框。 Pictures are not as helpful. 图片没有帮助。

Data 数据

df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), 
                 Value= c(900, NA, 1300, 1100, NA),
                 Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
                 Num1 = c(2, NA, 3, 2, NA),
                 Num2 = c(2,3,3,1,2),
                 Rent= c('y', 'y', 'n', 'y', 'n'))

To combine it all use lapply since split creates a list: 要合并所有内容，请使用lapply，因为split创建了一个列表：

lapply(split(temp, index), write.csv, file = "C:/Users/Nikita/Desktop/splits.csv", row.names=FALSE)

With a for loop: 使用for循环：

s <- split(temp, index)
for (i in 1:length(s)) {
  write.csv(s[i], file = paste0("C:/Users/Nikita/Desktop/", i, "splits.csv"), row.names=FALSE)
}

Answer 2

Recreating your example data: 重新创建示例数据：

df <- data.frame(Size= c(800, 850, 1100, 1200, 1000), 
                 Value= c(900, NA, 1300, 1100, NA),
                 Location= c(NA, 'midcity', 'uptown', NA, 'Lakeview'),
                 Num1 = c(2, NA, 3, 2, NA),
                 Num2 = c(2,3,3,1,2),
                 Rent= c('y', 'y', 'n', 'y', 'n'))

Now, splitting your data according to the pattern of NA as you want: 现在，根据需要按NA模式拆分数据：

# This generates an index with 1 for a column with NA and 0 otherwise
index <- apply(is.na(df)*1, 1,paste, collapse = "")

# This splits the data.frame according to the index
split(df, index)
$`000000`
  Size Value Location Num1 Num2 Rent
3 1100  1300   uptown    3    3    n

$`001000`
  Size Value Location Num1 Num2 Rent
1  800   900     <NA>    2    2    y
4 1200  1100     <NA>    2    1    y

$`010100`
  Size Value Location Num1 Num2 Rent
2  850    NA  midcity   NA    3    y
5 1000    NA Lakeview   NA    2    n

Notice that the first element "000000" comprises all the observations with complete cases. 注意，第一个元素“ 000000”包括所有具有完整案例的观察值。 Then "001000" comprises all observations where column 3 (location) is missing. 然后，“ 001000”包括缺少第3列（位置）的所有观察值。 And so on. 等等。

根据r中的NA值对实例进行分组

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-06-15 14:39:43

Data 数据

解决方案2
1 2015-06-15 14:58:24

根据r中的NA值对实例进行分组

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-06-15 14:39:43

Data 数据

解决方案2 1 2015-06-15 14:58:24

解决方案1
1 已采纳 2015-06-15 14:39:43

解决方案2
1 2015-06-15 14:58:24