简体   繁体   English

检查 R 中缺失数据模式的最优雅方法是什么?

[英]what is the most elegant way to check for patterns of missing data in R?

I have a set of numeric vectors in R each length 16. I would like to select those vectors that have all values present in one of four positions: 1:4, 5:8, 9:12, 13:16我在 R 中有一组数字向量,每个长度为 16。我想选择那些所有值都存在于以下四个位置之一的向量:1:4, 5:8, 9:12, 13:16

eg vector c(NA, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, NA, NA, 1, NA, 1, NA) would pass the test, since positions 5:8 are all non NA.例如向量c(NA, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, NA, NA, 1, NA, 1, NA)将通过测试,因为位置 5:8 都是非 NA。

What is the most elegant (ie using minimum easy-to-read code) way to test this?测试这个的最优雅(即使用最少的易于阅读的代码)方法是什么?

With a list of indices, you can iterate over those ranges and look for ones without any NA :使用索引列表,您可以遍历这些范围并查找没有任何NA

vec <- c(NA, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, NA, NA, 1, NA, 1, NA)
sapply(list(1:4, 5:8, 9:12, 13:16),
       function(ind) !anyNA(vec[ind]))
# [1] FALSE  TRUE FALSE FALSE

If you want to return the values within those indices:如果要返回这些索引中的值:

inds <- list(1:4, 5:8, 9:12, 13:16)
good <- sapply(inds, function(ind) !anyNA(vec[ind]))
# should check that `any(good)` is true
inds[[ which(good)[1] ]]
# [1] 5 6 7 8
vec[ inds[[ which(good)[1] ]] ]
# [1] 1 1 1 1

Here is an option with rleid to get the run-length-encoding id of the vector, use that as grouping variable to check if any of the sequence have full set of non-NA elements这是一个带有rleid的选项,用于获取向量的运行长度编码 ID,将其用作分组变量以检查是否有任何序列具有完整的非 NA 元素集

library(data.table)
any(as.logical(ave(seq_along(v1) * v1, rleid(v1),
         FUN = function(x) all(!is.na(x))) ))
#[1] TRUE

Or it could be also或者它也可以

any(with(rle(!is.na(v1)), lengths[values] >=4))
#[1] TRUE

Or another option is table或者另一种选择是table

4 %in% table(v1 * (seq_along(v1) -1) %/% 4)
#[1] TRUE

data数据

v1 <- c(NA, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, NA, NA, 1, NA, 1, NA)

The following code will return a single value ( TRUE or FALSE ).以下代码将返回单个值( TRUEFALSE )。 It returns TRUE if the vector passes the test.如果向量通过测试,则返回TRUE

vec <- c(NA, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, NA, NA, 1, NA, 1, NA)

!all(tapply(vec, rep(1:length(vec), each = 4, len = length(vec)), anyNA))
# [1] TRUE

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用 R 计算季节性平均值的最优雅方法是什么? - What is the most elegant way to calculate seasonal means with R? 分割数据和生成季节性箱形图的最优雅方法是什么? - What is the most elegant way to split data and produce seasonal boxplots? 检查丢失的软件包并安装它们的优雅方法? - Elegant way to check for missing packages and install them? 将 function 应用于 data.table 或 data.frame 中的多对列的最优雅方法是什么? - What is the most elegant way to apply a function to multiple pairs of columns in a data.table or data.frame? R:在将所有元素粘贴到单个字符串之前,最优雅的方法来清理数据框 - R: Most elegant way to sanitize data frame before pasting all elements to single string 查找具有所有唯一值的data.frame第一列的最优雅方法是什么? - what is the most elegant way to find the first column of a data.frame that has all unique values? 将存储在矩阵中的 n 位数据转换为 integer 的最优雅方法是什么? - What is the most elegant way to convert n-bit data stored in a matrix to integer? 使用不纯的 function 遍历数据帧的行的最优雅的方法是什么? - What is most elegant way to loop through rows of a data frame with an impure function? 最优雅的方式加载csv点与R中的千位分隔符 - Most elegant way to load csv with point as thousands separator in R R:用块重塑数据 - 更优雅的方式 - R: reshape data by chunks - more elegant way
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM