简体   繁体   中英

Summarise based on number of observations per year in a time-series

I've got a long dataframe like this:

 year   value  town
 2001   0.15   ny
 2002   0.19   ny
 2002   0.14   ca
 2001   NA     ny 
 2002   0.15   ny
 2002   0.12   ca 
 2001   NA     ny 
 2002   0.13   ny 
 2002   0.1    ca

I want to calculate a mean value per year and per species . Like this:

 df %>% group_by(year, town) %>% summarise(mean_year = mean(value, na.rm=T))

However, I only want to summarise those town values which have more than 2 non-NA values. In the example above, I don't want to summarise year 2001 for ny because it only has 1 non-NA value.

So the output would be like this:

town year mean_year  
ny   2001 NA         
ny   2002 0.156
ca   2002 0.45

try this

df %>% group_by(year, town) %>%
  summarise(mean_year = ifelse(sum(!is.na(value))>=2, mean(value, na.rm = T), NA))

# A tibble: 3 x 3
# Groups:   year [2]
   year town  mean_year
  <int> <chr>     <dbl>
1  2001 ny       NA    
2  2002 ca        0.12 
3  2002 ny        0.157

dput

> dput(df)
structure(list(year = c(2001L, 2002L, 2002L, 2001L, 2002L, 2002L, 
2001L, 2002L, 2002L), value = c(0.15, 0.19, 0.14, NA, 0.15, 0.12, 
NA, 0.13, 0.1), town = c("ny", "ny", "ca", "ny", "ny", "ca", 
"ny", "ny", "ca")), class = "data.frame", row.names = c(NA, -9L
))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM