简体   繁体   English

如果给定年份的所有观测值都是 NA,如何删除面板数据中的变量?

[英]How to delete variables in a panel data if all observations for a given year are NAs?

I have a dataframe like this,我有一个这样的数据框,

scores <-structure(list(student = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("adam", "mike", "rose"), class = "factor"), 
    year = c(2001L, 2002L, 2003L, 2001L, 2002L, 2003L, 2001L, 
    2002L, 2003L), math = c(5L, 3L, 5L, 3L, 2L, 4L, 4L, 2L, NA
    ), english = c(2L, NA, 5L, 4L, NA, 3L, 4L, NA, 4L), history = c(NA, 
    4L, 5L, NA, 3L, 4L, NA, 5L, 3L), geography = c(4L, 5L, 5L, 
    5L, 4L, 4L, 3L, 5L, 3L)), class = "data.frame", row.names = c(NA, 
-9L))

I want to delete the variable for which no student has score for a given year.我想删除给定年份没有学生得分的变量。 For example, no student has scores for English in 2002, therefore, I want to delete the variable "english" if my relevant year is 2002. Similarly, no student has score for History in 2001. So, if my relevant year is 2001, the variable "history" should be deleted.例如,没有学生在 2002 年有英语分数,因此,如果我的相关年份是 2002,我想删除变量“english”。同样,没有学生在 2001 年有历史分数。所以,如果我的相关年份是 2001,变量“history”应该被删除。 If my relevant year is 2003, no variable is deleted because at least one student (more precisely Mike and Adam) has score in the variable "math".如果我的相关年份是 2003 年,则不会删除任何变量,因为至少有一个学生(更准确地说是迈克和亚当)在变量“数学”中有分数。

To do this, I built the following function which does the job为此,我构建了以下功能来完成这项工作

byearNA<-function(x,z = 3, ano = 2001) {
    matri <- data.frame(matrix(, nrow=nrow(x), ncol=(z-1)))
    matri <- x[c(1:(z-1))]
    for (i in z:ncol(x)){
        if (all(is.na(x[x[2] == ano,i]))==FALSE) {
            matri <- cbind(matri,x[i])
        }
    }
    return(matri)
}

However, I really believe this can be done with native functions in R (functions that already exist).但是,我真的相信这可以通过 R 中的本机函数(已经存在的函数)来完成。 I have tried for long but I couldn't find a way and that is why I created my own function.我已经尝试了很长时间,但找不到方法,这就是我创建自己的函数的原因。

How can I achieve this task with native functions in R?如何使用 R 中的本机函数完成此任务?

Very much thank you in advance非常感谢您提前

I'm not 100% sure what you are looking for but have you tried this?我不是 100% 确定你在找什么,但你试过这个吗?

scores2 <- na.omit(scores)

This will return the 2 rows where there are complete cases (no NA values)这将返回有完整案例的 2 行(没有 NA 值)

adding some lines after thelatemail comments ... storing in long format is a good idea.在 thelatemail 评论后添加一些行......以长格式存储是个好主意。 you're going to want to work with a long data frame if you don't want to see NA values in your table here is a dplyr method如果您不想在表中看到 NA 值,您将想要使用长数据框,这里是 dplyr 方法

scores_gathered <- gather(scores, "class", "count", 3:6) 

scores_gathered <-scores_gathered %>%
  group_by(year, class) %>%
  summarize(sum = sum(count))

complete_list <- scores_gathered %>%
  drop_na(sum) %>%
  select(year, class) %>%
  mutate(has_students = "yes")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在data.table中添加延迟并导致每个观察结果中的更多变量排除NA? - How to add lag and lead to each observations for more variables excluding NAs within data.table? 在不适用的面板数据集中获得3年平均值 - Taking a 3 year average across in a panel data set with NAs 将面板数据中特定国家/地区的上一年值替换为 NA - Replacing NAs with prior year value for specific country in panel data 使用不同数量的观察值在面板数据中为每年分配五分位数 - Assign Quintiles for each year in Panel data with different number of observations 确定前 n 个观察值并按年份汇总 dataframe R 中的所有变量 - Identify top n observations and aggregate by year all variables in dataframe R 如何在数据框中组合两个观察结果并用相互矛盾的条目填充 NA - How to combine two observations in a data frame and fill NAs with contradicting entries 如何删除R中当年没有观察到的所有观察结果? - How to remove all observations for which there is no observation in the current year in R? 在面板数据中创建缺少的观察 - Create missing observations in panel data 如何处理R中面板数据中观察值内部和观察值之间的异常值? - How to deal with outliers within and between observations in a panel data in R? 如何正确取出 R 面板数据中的零观测值 - How to correctly take out zero observations in panel data in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM