[英]sieve out non-NA entries from data frame while retaining rows with only NA
I am looking for a more efficient way (in terms of length of code) of converting a data.frame
from: 我正在寻找一种更有效的方法(就代码的长度而言)从以下方式转换data.frame
:
# V1 V2 V3 V4 V5 V6 V7 V8 V9
# 1 1 2 3 NA NA NA NA NA NA
# 2 NA NA NA 3 2 1 NA NA NA
# 3 NA NA NA NA NA NA NA NA NA
# 4 NA NA NA NA NA NA NA NA NA
# 5 NA NA NA NA NA NA 1 2 3
to 至
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 3 2 1
#[3,] NA NA NA
#[4,] NA NA NA
#[5,] 1 2 3
That is, I want to remove excess NAs but correctly represent rows with only NAs. 也就是说,我想删除多余的NA,但只用NA正确地代表行。
I wrote the following function which does the job, but I am sure there is a less lengthy way of achieving the same. 我写了下面的函数来完成这项工作,但是我敢肯定,实现这一目标的方法不那么冗长。
#Dummy data.frame
data <- matrix(c(1:3, rep(NA, 6),
rep(NA, 3), 3:1, rep(NA, 3),
rep(NA, 9),
rep(NA, 9),
rep(NA, 6), 1:3),
byrow=TRUE, ncol=9)
data <- as.data.frame(data)
sieve <- function(data) {
#get a list of all entries that are not NA
cond <- apply(data, 1, function(x) x[!is.na(x)])
#set integer(0) equal to NA
cond[sapply(cond, function(x) length(x)==0)] <- NA
#check how many items there are in non-empty rows
#(rows are either empty or contain the same number of items)
n <- max(sapply(cond, length))
#replace single NA with n NAs, where n = number of items
#first get an index of entries with single NAs
index <- (1:length(cond)) [sapply(cond, function(x) length(x)==1)]
#then replace each entry with n NAs
for (i in index) cond[[i]] <- rep(NA, n)
#turn list into a data.frame
cond <- matrix(unlist(cond), nrow=length(cond), byrow=TRUE)
cond
}
sieve(data)
My question resembles this question about extracting conditions to which participants are assigned (for which I received great answers). 我的问题类似于关于提取参与者分配条件的问题 (我收到了很好的答案)。 I tried expanding these answers to the current dummy data, but without success so far. 我尝试将这些答案扩展到当前的虚拟数据,但到目前为止没有成功。 Hence my rather lengthy custom function. 因此,我的冗长的自定义函数。
Edit: Additional info for why I am asking this question: The first data frame represents the raw output from an experiment in which I assigned participants to one of three conditions (using 3 here for simplicity). 编辑:有关为什么我问这个问题的其他信息:第一个数据帧表示实验的原始输出,在该实验中,我将参与者分配给了三个条件之一(此处为简单起见使用3)。 In each condition, participants read a different scenario, but then answered the same set of questions about the scenario they had read. 在每种情况下,参与者阅读不同的场景,然后回答有关他们已阅读的场景的同一组问题。 Qualtrics recorded answers from participants in the first condition in the columns V1
through V3
, answers from participants in the second condition in the columns V4
through V6
and answers from participants in the third condition in columns V7
through V9
. Qualtrics在V1
到V3
列中记录了第一种情况的参与者的答案,在V4
到V6
列中记录了第二种条件的参与者的答案,在V7
到V9
列中记录了第三种情形的参与者的答案。 (If this block of questions would have contained 4 questions it would have been columns V1
through V4
for answers from participants in the first condition, V2
through V8
for answers from participants in the second condition ...). (如果该问题块包含4个问题,则第一列条件下参与者的答案将在V1
至V4
列中,第二条件下参与者的答案将在V2
至V8
中……)。
You can try this if the length of non-NAs is always the same in rows that aren't entirely filled with NA: 如果非NA的长度在未完全填充NA的行中始终相同,则可以尝试以下方法:
First, create a data frame with the appropriate (transposed) dimensions, and fill it with NAs. 首先,创建具有适当(转置)尺寸的数据框,并用NA填充它。
d2 <- data.frame(
matrix(nrow = max(apply(d, 1, function(ii) sum(!is.na(ii)))),
ncol=nrow(d)))
Then, using apply
fill that data frame, then transpose it to get your desired outcome: 然后,使用apply
填充该数据框,然后对其进行转置以获得所需的结果:
d2[] <- apply(d, 1, function(ii) ii[!is.na(ii)])
t(d2)
# [,1] [,2] [,3]
#X1 1 2 3
#X2 3 2 1
#X3 NA NA NA
#X4 NA NA NA
#X5 1 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.