[英]How to delete variables in a panel data if all observations for a given year are NAs?
I have a dataframe like this,我有一个这样的数据框,
scores <-structure(list(student = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("adam", "mike", "rose"), class = "factor"),
year = c(2001L, 2002L, 2003L, 2001L, 2002L, 2003L, 2001L,
2002L, 2003L), math = c(5L, 3L, 5L, 3L, 2L, 4L, 4L, 2L, NA
), english = c(2L, NA, 5L, 4L, NA, 3L, 4L, NA, 4L), history = c(NA,
4L, 5L, NA, 3L, 4L, NA, 5L, 3L), geography = c(4L, 5L, 5L,
5L, 4L, 4L, 3L, 5L, 3L)), class = "data.frame", row.names = c(NA,
-9L))
I want to delete the variable for which no student has score for a given year.我想删除给定年份没有学生得分的变量。 For example, no student has scores for English in 2002, therefore, I want to delete the variable "english" if my relevant year is 2002. Similarly, no student has score for History in 2001. So, if my relevant year is 2001, the variable "history" should be deleted.
例如,没有学生在 2002 年有英语分数,因此,如果我的相关年份是 2002,我想删除变量“english”。同样,没有学生在 2001 年有历史分数。所以,如果我的相关年份是 2001,变量“history”应该被删除。 If my relevant year is 2003, no variable is deleted because at least one student (more precisely Mike and Adam) has score in the variable "math".
如果我的相关年份是 2003 年,则不会删除任何变量,因为至少有一个学生(更准确地说是迈克和亚当)在变量“数学”中有分数。
To do this, I built the following function which does the job为此,我构建了以下功能来完成这项工作
byearNA<-function(x,z = 3, ano = 2001) {
matri <- data.frame(matrix(, nrow=nrow(x), ncol=(z-1)))
matri <- x[c(1:(z-1))]
for (i in z:ncol(x)){
if (all(is.na(x[x[2] == ano,i]))==FALSE) {
matri <- cbind(matri,x[i])
}
}
return(matri)
}
However, I really believe this can be done with native functions in R (functions that already exist).但是,我真的相信这可以通过 R 中的本机函数(已经存在的函数)来完成。 I have tried for long but I couldn't find a way and that is why I created my own function.
我已经尝试了很长时间,但找不到方法,这就是我创建自己的函数的原因。
How can I achieve this task with native functions in R?如何使用 R 中的本机函数完成此任务?
Very much thank you in advance非常感谢您提前
I'm not 100% sure what you are looking for but have you tried this?我不是 100% 确定你在找什么,但你试过这个吗?
scores2 <- na.omit(scores)
This will return the 2 rows where there are complete cases (no NA values)这将返回有完整案例的 2 行(没有 NA 值)
adding some lines after thelatemail comments ... storing in long format is a good idea.在 thelatemail 评论后添加一些行......以长格式存储是个好主意。 you're going to want to work with a long data frame if you don't want to see NA values in your table here is a dplyr method
如果您不想在表中看到 NA 值,您将想要使用长数据框,这里是 dplyr 方法
scores_gathered <- gather(scores, "class", "count", 3:6)
scores_gathered <-scores_gathered %>%
group_by(year, class) %>%
summarize(sum = sum(count))
complete_list <- scores_gathered %>%
drop_na(sum) %>%
select(year, class) %>%
mutate(has_students = "yes")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.