简体   繁体   中英

How to delete variables in a panel data if all observations for a given year are NAs?

I have a dataframe like this,

scores <-structure(list(student = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("adam", "mike", "rose"), class = "factor"), 
    year = c(2001L, 2002L, 2003L, 2001L, 2002L, 2003L, 2001L, 
    2002L, 2003L), math = c(5L, 3L, 5L, 3L, 2L, 4L, 4L, 2L, NA
    ), english = c(2L, NA, 5L, 4L, NA, 3L, 4L, NA, 4L), history = c(NA, 
    4L, 5L, NA, 3L, 4L, NA, 5L, 3L), geography = c(4L, 5L, 5L, 
    5L, 4L, 4L, 3L, 5L, 3L)), class = "data.frame", row.names = c(NA, 
-9L))

I want to delete the variable for which no student has score for a given year. For example, no student has scores for English in 2002, therefore, I want to delete the variable "english" if my relevant year is 2002. Similarly, no student has score for History in 2001. So, if my relevant year is 2001, the variable "history" should be deleted. If my relevant year is 2003, no variable is deleted because at least one student (more precisely Mike and Adam) has score in the variable "math".

To do this, I built the following function which does the job

byearNA<-function(x,z = 3, ano = 2001) {
    matri <- data.frame(matrix(, nrow=nrow(x), ncol=(z-1)))
    matri <- x[c(1:(z-1))]
    for (i in z:ncol(x)){
        if (all(is.na(x[x[2] == ano,i]))==FALSE) {
            matri <- cbind(matri,x[i])
        }
    }
    return(matri)
}

However, I really believe this can be done with native functions in R (functions that already exist). I have tried for long but I couldn't find a way and that is why I created my own function.

How can I achieve this task with native functions in R?

Very much thank you in advance

I'm not 100% sure what you are looking for but have you tried this?

scores2 <- na.omit(scores)

This will return the 2 rows where there are complete cases (no NA values)

adding some lines after thelatemail comments ... storing in long format is a good idea. you're going to want to work with a long data frame if you don't want to see NA values in your table here is a dplyr method

scores_gathered <- gather(scores, "class", "count", 3:6) 

scores_gathered <-scores_gathered %>%
  group_by(year, class) %>%
  summarize(sum = sum(count))

complete_list <- scores_gathered %>%
  drop_na(sum) %>%
  select(year, class) %>%
  mutate(has_students = "yes")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM