简体   繁体   English

如何根据R中有效列数(NA除外)选择数据框中的某些列?

[英]How can I select certain columns in a dataframe based on their number of valid values (except NA) in R?

I'm using R, and I have a dataframe with multiple columns. 我正在使用R,并且有一个包含多列的数据框。 I want to run a code and automatically check the number of values (valid values, not NA) in each column. 我想运行代码并自动检查每列中的值数(有效值,不是NA)。 Then, it should select the columns that 50% of its rows are filled by valid values, and save them in a new dataframe. 然后,应选择其50%的行由有效值填充的列,并将其保存在新的数据框中。

Can anybody help me doing this? 有人可以帮我这样做吗? Thank you very much. 非常感谢你。

Is there any way that the codes can be applied for an uncertain number of columns? 有什么方法可以将代码应用于不确定的列数?

Using purrr package, you can write function below to check for the percentage of missing values: 使用purrr包,您可以编写以下函数来检查缺失值的百分比:

pct_missing <- purrr::map_dbl(df,~mean(is.na(.x)))

After that, you can select those columns that have less than 50% missing values by their names. 之后,您可以选择名称缺失值少于50%的那些列。

selected_column <- colnames(df)[pct_missing < 0.5]

To create a new dataset, you may use: 要创建新的数据集,您可以使用:

library(dplyr)
df_new <- df %>% select(one_of(selected_column))

You can create a function within R base also to automatically retrieve the colums matching the critria: 您还可以在R base中创建一个函数,以自动检索与critria匹配的列:

Function: 功能:

ColSel <- function(df){
vals <- apply(df,2, function(fo) mean(is.na(fo))) < .5
return(df[,vals])
}

Some toy data 一些玩具数据

## example
df1 <- data.frame(
    a = c(runif(19),NA),
    b = c(rep(NA,11),runif(9)),
    d = rep(NA,20),
    e = runif(20)
    )

Test 测试

df2 <- ColSel(df1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R-如何根据某些变量中NA值的数量排除个案 - R - How to exclude cases based on the number of NA values in certain variables R数据帧:基于列中的值小于预定数将其设置为NA - R dataframe: setting values in columns to NA based on them being less than a predetermined number 合并r中不包含NA值的列 - Merge columns except NA values in r 如何根据 R dataframe 中的列将 NA 值替换为不同的值? - How to replace NA values with different values based on column in R dataframe? 如何用R中的NA替换特定行和列中的某些值? - How to replace certain values in a specific rows and columns with NA in R? 根据具有有效值的行数从数据框中删除列 - Remove columns from a dataframe based on number of rows with valid values 在 R 中:如何根据窗口中 NA 值的数量有条件地使用 rollapply? - In R: how to conditionally use rollapply based on the number of NA values in the window? 如何将 dataframe 中的某些值替换为 r 中的列名? - How can I replace certain values in a dataframe with the column name in r? 如何基于R中的二进制数为NA设置一些值? - How to set some values to NA based on a binary number in R? 基于另一个数据帧重命名数据帧的列,除了 R 中不在该数据帧中的列 - Rename columns of a dataframe based on another dataframe except columns not in that dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM