[英]Calculating mean with specified condition for number of missing observations
I need to calculate the mean value for each country from 1991-2000, but only if that country is missing 2 or less years (NA) of data in that time range.我需要计算从 1991 年到 2000 年每个国家/地区的平均值,但前提是该国家/地区在该时间范围内缺少 2 年或更少年 (NA) 的数据。
So, here's a sample of the data I have...所以,这是我拥有的数据样本......
# A tibble: 275 x 52
country aid1960 aid1961 aid1962 aid1963 aid1964 aid1965 aid1966 aid1967 aid1968
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abkhazia NA NA NA NA NA NA NA NA NA
2 Afghanist~ 1.78 3.52 1.68 3.57 4.41 5.04 4.61 3.64 2.56
3 Akrotiri ~ NA NA NA NA NA NA NA NA NA
4 Albania NA NA NA NA NA NA NA NA NA
5 Algeria 32.9 39.8 35.5 24.7 19.5 12.3 10.4 8.56 9.24
6 American ~ NA NA NA NA NA NA NA NA NA
7 Andorra NA NA NA NA NA NA NA NA NA
8 Angola -0.0101 4.66 NA 0.00572 NA 0.204 0.518 3.24 0.00175
9 Anguilla NA NA NA NA NA NA NA NA NA
10 Antigua a~ NA NA NA NA NA NA NA NA NA
# ... with 265 more rows, and 42 more variables: aid1969 <dbl>, aid1970 <dbl>,
# aid1971 <dbl>, aid1972 <dbl>, aid1973 <dbl>, aid1974 <dbl>, aid1975 <dbl>,
# aid1976 <dbl>, aid1977 <dbl>, aid1978 <dbl>, aid1979 <dbl>, aid1980 <dbl>,
# aid1981 <dbl>, aid1982 <dbl>, aid1983 <dbl>, aid1984 <dbl>, aid1985 <dbl>,
# aid1986 <dbl>, aid1987 <dbl>, aid1988 <dbl>, aid1989 <dbl>, aid1990 <dbl>,
# aid1991 <dbl>, aid1992 <dbl>, aid1993 <dbl>, aid1994 <dbl>, aid1995 <dbl>,
# aid1996 <dbl>, aid1997 <dbl>, aid1998 <dbl>, aid1999 <dbl>, aid2000 <dbl>, ...
I was able to determine which countries were missing data in that time range using this code...我能够使用此代码确定哪些国家/地区在该时间范围内缺少数据......
rowSums(is.na(countrydata[, 33:42]))
Which gave me an output of mostly 0 and 10 values (0 meaning no missing data, 1 meaning 1 year of missing data, and so on).这给了我一个 output 主要是 0 和 10 个值(0 表示没有缺失数据,1 表示缺失数据 1 年,依此类推)。
So, I need to keep the countries whose rowSums value for this range is 0, 1, or 2. How would I integrate this condition into my code for the 'rowMeans' command so it only gives an output of countries missing 2 or less years (NA) of data in that time range?因此,我需要保留此范围的 rowSums 值为 0、1 或 2 的国家/地区。如何将此条件集成到我的“rowMeans”命令代码中,以便它仅给出缺失 2 年或更少年的国家的 output (NA)该时间范围内的数据?
With the rowSums
create a logical index, use that to subset the data by specifying it as row index, get the rowMeans
使用rowSums
创建一个逻辑索引,通过将其指定为行索引来使用它来对数据进行子集化,获取rowMeans
i1 <- rowSums(is.na(countrydata[, 33:42])) <=2
Means <- rowMeans(countrydata[i1, 33:42], na.rm = TRUE)
country <- countrydata$country[i1]
data.frame(country, Means)
Or using tidyverse
或使用tidyverse
library(dplyr)
countrydata %>%
select(country, 33:42) %>%
filter(rowSums(across(where(is.numeric), is.na)) <= 2) %>%
transmute(country, Means = rowMeans(.[-1], na.rm = TRUE))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.