R：识别一列中的非 NA 值并创建 dataframe 并选择另一列中的值

Question

I have a data frame (df) with multiple columns (45) and rows (20,000):我有一个包含多列（45）和多行（20,000）的数据框（df）：

I want to filter each variable column by selecting only the rows with non-NA values and creating a separate data frame with the corresponding ID and Name for the rows selected.我想通过仅选择具有非 NA 值的行并为所选行创建一个具有相应 ID 和名称的单独数据框来过滤每个变量列。 I then want to save each data frame with the corresponding variable name.然后我想用相应的变量名保存每个数据框。 For example, the output data frames would look like this and would be saved as Var1 and Var2 respectively.例如，output 数据帧如下所示，将分别保存为 Var1 和 Var2。

Var 1 <变量 1 <

Var 2 <变量 2 <

I am currently trying to use this function on R and thinking of implementing a for loop.我目前正在尝试在 R 上使用此 function 并考虑实现 for 循环。

df2 = lapply(df, function(x) {x[.is.na(x)]}). df2 = lapply(df, function(x) {x[.is.na(x)]})。

This hasn't worked so well as it does not list the values from corresponding ID and Name column.这效果不太好，因为它没有列出相应 ID 和 Name 列中的值。 This also doesn't create a dataframe.这也不会创建 dataframe。

Any suggestions will be greatly appreciated!任何建议将不胜感激！

Answer 1

Here is how it can be done using dplyr & purrr这是使用dplyr和purrr完成的方法

Note that next time instead of posting image of your data, please try create sample data in R and copy paste the dput of that sample data instead.请注意，下次不要发布您的数据图像，而是尝试在dput中创建示例数据，然后复制粘贴该示例数据的 dput。

library(purrr)
library(dplyr)

data <- tibble(ID = c("A", "B", "C"),
  Name = c("D", "E", "F"),
  Var1 = c(1, NA, 2),
  Var2 = c(2, 2, NA),
  Var4 = c(NA, NA, 4))

columns <- names(data)[grepl("^Var", names(data))]


extract_na_item <- function(column_name, df) {
  df %>%
    filter(!is.na(!!sym(column_name))) %>%
    select(ID, Name)
}
list_var_not_na <- map(columns, extract_na_item, df = data)
names(list_var_not_na) <- columns

Here is the result这是结果

list_var_not_na
#> $Var1
#> # A tibble: 2 x 2
#>   ID    Name 
#>   <chr> <chr>
#> 1 A     D    
#> 2 C     F    
#> 
#> $Var2
#> # A tibble: 2 x 2
#>   ID    Name 
#>   <chr> <chr>
#> 1 A     D    
#> 2 B     E    
#> 
#> $Var4
#> # A tibble: 1 x 2
#>   ID    Name 
#>   <chr> <chr>
#> 1 C     F

And if you really want to have the variable assignment in global environment as you mentioned in OP you can do as below (Though I recommend just use the list to access the data instead)如果您真的想在 OP 中提到的那样在全局环境中分配变量，您可以执行以下操作（尽管我建议只使用列表来访问数据）

list2env(list_var_not_na, envir = globalenv())

^{Created on 2021-05-03 by the reprex package (v2.0.0)}^{由代表 package (v2.0.0) 于 2021 年 5 月 3 日创建}

Answer 2

You can use lapply like so:您可以像这样使用lapply ：

cols <- grep('Var', names(df))
df2 <- lapply(df[cols], function(x) df[!is.na(x), -cols])
df2

#$Var1
#  ID Name
#1  A    D
#3  C    F

#$Var2
#  ID Name
#1  A    D
#2  B    E

#$Var4
#  ID Name
#3  C    F

data数据

df <- structure(list(ID = c("A", "B", "C"), Name = c("D", "E", "F"), 
    Var1 = c(1, NA, 2), Var2 = c(2, 2, NA), Var4 = c(NA, NA, 
    4)), class = "data.frame", row.names = c(NA, -3L))

R：识别一列中的非 NA 值并创建 dataframe 并选择另一列中的值

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-05-03 03:22:31

解决方案2
2 2021-05-03 04:09:28

R：识别一列中的非 NA 值并创建 dataframe 并选择另一列中的值

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-05-03 03:22:31

解决方案2 2 2021-05-03 04:09:28

解决方案1
2 已采纳 2021-05-03 03:22:31

解决方案2
2 2021-05-03 04:09:28