简体   繁体   English

从数据帧中筛选出非NA条目,同时保留仅包含NA的行

[英]sieve out non-NA entries from data frame while retaining rows with only NA

I am looking for a more efficient way (in terms of length of code) of converting a data.frame from: 我正在寻找一种更有效的方法(就代码的长度而言)从以下方式转换data.frame

#   V1 V2 V3 V4 V5 V6 V7 V8 V9
# 1  1  2  3 NA NA NA NA NA NA
# 2 NA NA NA  3  2  1 NA NA NA
# 3 NA NA NA NA NA NA NA NA NA
# 4 NA NA NA NA NA NA NA NA NA
# 5 NA NA NA NA NA NA 1  2  3

to

#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    3    2    1
#[3,]   NA   NA   NA
#[4,]   NA   NA   NA
#[5,]    1    2    3

That is, I want to remove excess NAs but correctly represent rows with only NAs. 也就是说,我想删除多余的NA,但只用NA正确地代表行。

I wrote the following function which does the job, but I am sure there is a less lengthy way of achieving the same. 我写了下面的函数来完成这项工作,但是我敢肯定,实现这一目标的方法不那么冗长。

#Dummy data.frame
data <- matrix(c(1:3, rep(NA, 6), 
          rep(NA, 3), 3:1, rep(NA, 3), 
          rep(NA, 9),
          rep(NA, 9),
          rep(NA, 6), 1:3),
          byrow=TRUE, ncol=9)
data <- as.data.frame(data)

sieve <- function(data) {

        #get a list of all entries that are not NA
        cond <- apply(data, 1, function(x) x[!is.na(x)])
        #set integer(0) equal to NA
        cond[sapply(cond, function(x) length(x)==0)] <- NA

        #check how many items there are in non-empty rows
        #(rows are either empty or contain the same number of items)
        n <- max(sapply(cond, length))

        #replace single NA with n NAs, where n = number of items
        #first get an index of entries with single NAs
        index <- (1:length(cond)) [sapply(cond, function(x) length(x)==1)]
        #then replace each entry with n NAs
        for (i in index) cond[[i]]  <- rep(NA, n)

        #turn list into a data.frame
        cond <- matrix(unlist(cond), nrow=length(cond), byrow=TRUE)
        cond
}

sieve(data)

My question resembles this question about extracting conditions to which participants are assigned (for which I received great answers). 我的问题类似于关于提取参与者分配条件的问题 (我收到了很好的答案)。 I tried expanding these answers to the current dummy data, but without success so far. 我尝试将这些答案扩展到当前的虚拟数据,但到目前为止没有成功。 Hence my rather lengthy custom function. 因此,我的冗长的自定义函数。


Edit: Additional info for why I am asking this question: The first data frame represents the raw output from an experiment in which I assigned participants to one of three conditions (using 3 here for simplicity). 编辑:有关为什么我问这个问题的其他信息:第一个数据帧表示实验的原始输出,在该实验中,我将参与者分配给了三个条件之一(此处为简单起见使用3)。 In each condition, participants read a different scenario, but then answered the same set of questions about the scenario they had read. 在每种情况下,参与者阅读不同的场景,然后回答有关他们已阅读的场景的同一组问题。 Qualtrics recorded answers from participants in the first condition in the columns V1 through V3 , answers from participants in the second condition in the columns V4 through V6 and answers from participants in the third condition in columns V7 through V9 . Qualtrics在V1V3列中记录了第一种情况的参与者的答案,在V4V6列中记录了第二种条件的参与者的答案,在V7V9列中记录了第三种情形的参与者的答案。 (If this block of questions would have contained 4 questions it would have been columns V1 through V4 for answers from participants in the first condition, V2 through V8 for answers from participants in the second condition ...). (如果该问题块包含4个问题,则第一列条件下参与者的答案将在V1V4列中,第二条件下参与者的答案将在V2V8中……)。


You can try this if the length of non-NAs is always the same in rows that aren't entirely filled with NA: 如果非NA的长度在未完全填充NA的行中始终相同,则可以尝试以下方法:

First, create a data frame with the appropriate (transposed) dimensions, and fill it with NAs. 首先,创建具有适当(转置)尺寸的数据框,并用NA填充它。

d2 <- data.frame(
        matrix(nrow = max(apply(d, 1, function(ii) sum(!is.na(ii)))),
               ncol=nrow(d)))

Then, using apply fill that data frame, then transpose it to get your desired outcome: 然后,使用apply填充该数据框,然后对其进行转置以获得所需的结果:

d2[] <- apply(d, 1, function(ii) ii[!is.na(ii)])
t(d2)
#   [,1] [,2] [,3]
#X1    1    2    3
#X2    3    2    1
#X3   NA   NA   NA
#X4   NA   NA   NA
#X5    1    2    3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM