[英]R: Identify non-NA values from one column and create dataframe with values from another column based rows selected
I have a data frame (df) with multiple columns (45) and rows (20,000):我有一个包含多列(45)和多行(20,000)的数据框(df):
I want to filter each variable column by selecting only the rows with non-NA values and creating a separate data frame with the corresponding ID and Name for the rows selected.我想通过仅选择具有非 NA 值的行并为所选行创建一个具有相应 ID 和名称的单独数据框来过滤每个变量列。 I then want to save each data frame with the corresponding variable name.
然后我想用相应的变量名保存每个数据框。 For example, the output data frames would look like this and would be saved as Var1 and Var2 respectively.
例如,output 数据帧如下所示,将分别保存为 Var1 和 Var2。
Var 1 <变量 1 <
Var 2 <变量 2 <
I am currently trying to use this function on R and thinking of implementing a for loop.我目前正在尝试在 R 上使用此 function 并考虑实现 for 循环。
df2 = lapply(df, function(x) {x[.is.na(x)]}). df2 = lapply(df, function(x) {x[.is.na(x)]})。
This hasn't worked so well as it does not list the values from corresponding ID and Name column.这效果不太好,因为它没有列出相应 ID 和 Name 列中的值。 This also doesn't create a dataframe.
这也不会创建 dataframe。
Any suggestions will be greatly appreciated!任何建议将不胜感激!
Here is how it can be done using dplyr
& purrr
这是使用
dplyr
和purrr
完成的方法
Note that next time instead of posting image of your data, please try create sample data in R and copy paste the dput
of that sample data instead.请注意,下次不要发布您的数据图像,而是尝试在
dput
中创建示例数据,然后复制粘贴该示例数据的 dput。
library(purrr)
library(dplyr)
data <- tibble(ID = c("A", "B", "C"),
Name = c("D", "E", "F"),
Var1 = c(1, NA, 2),
Var2 = c(2, 2, NA),
Var4 = c(NA, NA, 4))
columns <- names(data)[grepl("^Var", names(data))]
extract_na_item <- function(column_name, df) {
df %>%
filter(!is.na(!!sym(column_name))) %>%
select(ID, Name)
}
list_var_not_na <- map(columns, extract_na_item, df = data)
names(list_var_not_na) <- columns
Here is the result这是结果
list_var_not_na
#> $Var1
#> # A tibble: 2 x 2
#> ID Name
#> <chr> <chr>
#> 1 A D
#> 2 C F
#>
#> $Var2
#> # A tibble: 2 x 2
#> ID Name
#> <chr> <chr>
#> 1 A D
#> 2 B E
#>
#> $Var4
#> # A tibble: 1 x 2
#> ID Name
#> <chr> <chr>
#> 1 C F
And if you really want to have the variable assignment in global environment as you mentioned in OP you can do as below (Though I recommend just use the list to access the data instead)如果您真的想在 OP 中提到的那样在全局环境中分配变量,您可以执行以下操作(尽管我建议只使用列表来访问数据)
list2env(list_var_not_na, envir = globalenv())
Created on 2021-05-03 by the reprex package (v2.0.0)由代表 package (v2.0.0) 于 2021 年 5 月 3 日创建
You can use lapply
like so:您可以像这样使用
lapply
:
cols <- grep('Var', names(df))
df2 <- lapply(df[cols], function(x) df[!is.na(x), -cols])
df2
#$Var1
# ID Name
#1 A D
#3 C F
#$Var2
# ID Name
#1 A D
#2 B E
#$Var4
# ID Name
#3 C F
data数据
df <- structure(list(ID = c("A", "B", "C"), Name = c("D", "E", "F"),
Var1 = c(1, NA, 2), Var2 = c(2, 2, NA), Var4 = c(NA, NA,
4)), class = "data.frame", row.names = c(NA, -3L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.