从数据帧列表中选择数据帧

Question

I am trying to select data frames from within a long list of data frames, based on whether certain columns are empty. 我试图根据某些列是否为空，从一长串数据帧中选择数据帧。

Here is a reproducible example, along with the code I have written to try to solve this problem. 这是一个可复制的示例，以及为解决该问题而编写的代码。 I am using one random variable as an acceptable substitute for another (here, b for c), so I am not bothered if df1 a ends up in both group 1 and group 2. 我使用一个随机变量替代另一个随机变量（此处b代表c），因此如果df1 a出现在第1组和第2组中，我都不会感到烦恼。

d1 <- data.frame(a=rnorm(5), b=1:5, c=rnorm(5))
d2 <- data.frame(a=1:5, b=rnorm(5), c = c(NA, NA, NA, NA, NA))
d3 <- data.frame(a=1:5, b=c(NA, NA, NA, NA, NA), c=c(1:5))

my_test_data <- list(d1, d2, d3)
group_1 <- list()
group_2 <- list()

for (i in 1:length(my_test_data)) {
if(!is.nan(my_test_data[[i]]$b)) {
group_1[i] <- my_test_data[i]
}
else if (!is.nan(my_test_data[[i]]$c)) {
group_2[i] <- my_test_data[i]
}
else NULL
}

I get warning messages saying: 我收到警告消息：

Warning messages: 1: In if (!is.nan(my_test_data[[i]]$b)) { : the condition has length > 1 and only the first element will be used 警告消息：1：在if（！is.nan（my_test_data [[i]] $ b））{中：条件的长度> 1，并且仅使用第一个元素

and group 1 and group 2 are identical to my_test_data 组1和组2与my_test_data相同

All help greatly appreciated. 所有帮助，不胜感激。

Answer 1

Couple of issues going on in your sample code. 您的示例代码中发生的几个问题。

The reason you are getting the warning is because your if condition is returning a vector. 您收到警告的原因是因为您的if条件正在返回向量。

is.nan(my_test_data[[3]]$b) 
[1] FALSE FALSE FALSE FALSE FALSE`

The second problem you have is that your sample data doesn't have any NaN values. 您遇到的第二个问题是样本数据没有任何NaN值。 You have NA values...so is.nan() won't find anything anyway. 您有NA值...所以is.nan()仍然找不到任何内容。

The third problem you have is that your if / else if / else isn't formatted right. 您遇到的第三个问题是if / else if / else的格式不正确。 The else if goes on the same line as the curly brackets } else if () { else if与大括号在同一行} else if () {

A fourth problem is that NULL by itself doesn't do anything. 第四个问题是NULL本身不会执行任何操作。 You might as well leave out the else NULL because it doesn't do anything. 您最好忽略else NULL因为它什么也没做。 Or you can change it to do something. 或者您可以更改它以执行某些操作。

Instead of using !is.na() we can check whether TRUE %in% is.na() 代替使用!is.na()我们可以检查TRUE %in% is.na()

for (i in 1:length(my_test_data)) {
    if(TRUE %in% is.na(my_test_data[[i]]$b)) {
      group_1[i] <- my_test_data[i]
    } else if (TRUE %in% is.na(my_test_data[[i]]$c)) {
      group_2[i] <- my_test_data[i]
    } # else {
    #   NULL
    # }
}

Now, your code will still have some strange lists because you are storing NULL values in the list, but I'm not sure if that's ok for what you're doing. 现在，您的代码仍然会有一些奇怪的列表，因为您将NULL值存储在列表中，但是我不确定这样做是否可行。

str(group_1)

List of 3
$ : NULL
$ : NULL
$ :'data.frame':    5 obs. of  3 variables:
    ..$ a: int [1:5] 1 2 3 4 5
    ..$ b: logi [1:5] NA NA NA NA NA
    ..$ c: int [1:5] 1 2 3 4 5

从数据帧列表中选择数据帧

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-11-29 20:40:30

从数据帧列表中选择数据帧

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-11-29 20:40:30

解决方案1
2 已采纳 2017-11-29 20:40:30