简体   繁体   中英

Remove NAs from data frame without deleting entire rows/columns

I'm analyzing some pilot data for an experiment where we are giving participants 60 pairs of auditory stimuli from a pool of 190 pairs to rate on a 4 point scale. I get a lot of missing values since the participants are rating different pairs each time.

I really don't care about which participant said what, I just need all the ratings for the same pair to be in the same row so I can perform a Light's Kappa test for inter-rater agreement on each pair in n with kappam.light (irr package).

Here is the head of my data for 15 participants, where n is the number of the pair and m is the participant:

> head(my.data)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
1   NA    1   NA    1   NA   NA   NA   NA    2     2    NA    NA    NA     3    NA
2   NA    3   NA   NA    3   NA   NA   NA    3     3    NA    NA     4    NA     3
3   NA   NA    1   NA   NA    4   NA    1   NA    NA     1     3    NA    NA     3
4   NA   NA    2   NA    1   NA   NA    1   NA    NA    NA    NA    NA    NA    NA
5    1   NA   NA    1   NA   NA   NA    1   NA    NA     4     1    NA    NA    NA
6    2   NA   NA   NA    1   NA   NA   NA    1     3    NA    NA    NA     2    NA

The output I want (if possible) is the following:

   [,1] [,2] [,3] [,4] [,5] [,6]
1    1    1    2    2    3
2    3    3    3    3    4    3
3    1    4    1    1    3    3
4    2    1    1   
5    1    1    1    4    1  
6    2    1    1    3    2   

I'm not sure if R will allow varying row lengths in a data frame/matrix, but it would be great to get rid of as many missing values as possible so kappam.light won't just disregard the whole row.

You can easily get rid of NA values in a list . On the other hand, both matrix and data.frame need to have constant row length. Here's one way to do this:

# list removing NA's
lst <- apply(my.data, 1, function(x) x[!is.na(x)])
# maximum lenght
ll <- max(sapply(lst, length))
# combine 
t(sapply(lst, function(x) c(x, rep(NA, ll-length(x)))))

If you don't mind leaving the all NA columns in m2 then the second line of code could be omitted:

m2 <- t(apply(m, 1, function(x) x[order(is.na(x))])) # sort NAs to end of ea row
m2[, !!colSums(!is.na(m2))] 

The last line could have alternately been: m2[, apply(m2, 2, function(x) any(!is.na(x)))]

The result is:

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    3   NA
[2,]    3    3    3    3    4    3
[3,]    1    4    1    1    3    3
[4,]    2    1    1   NA   NA   NA
[5,]    1    1    1    4    1   NA
[6,]    2    1    1    3    2   NA

Note: We used this as the input, m :

m <-
structure(c(NA, NA, NA, NA, 1L, 2L, 1L, 3L, NA, NA, NA, NA, NA, 
NA, 1L, 2L, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, 3L, NA, 1L, NA, 
1L, NA, NA, 4L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 
1L, 1L, NA, 2L, 3L, NA, NA, NA, 1L, 2L, 3L, NA, NA, NA, 3L, NA, 
NA, 1L, NA, 4L, NA, NA, NA, 3L, NA, 1L, NA, NA, 4L, NA, NA, NA, 
NA, 3L, NA, NA, NA, NA, 2L, NA, 3L, 3L, NA, NA, NA), .Dim = c(6L, 
15L), .Dimnames = list(NULL, NULL))

Next time please provide your data in this form using dput .

Would something like this work?

# initialize empty data frame
datt <- data.frame()

library(plyr)

for(i in 1:nrow(my.data)){
    myd <- my.data[i, ]
    myd <- myd[, !is.na(myd)]
    names(myd) <- 1:length(myd)
    datt <- rbind.fill(datt, myd)
}

datt
  1 2 3  4  5  6
1 1 1 2  2  3 NA
2 3 3 3  3  4  3
3 1 4 1  1  3  3
4 2 1 1 NA NA NA
5 1 1 1  4  1 NA
6 2 1 1  3  2 NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM