简体   繁体   English

在R中,如何选择/设置值大于某个值的网站,然后保留所有包含小于所选值的网站?

[英]In R, how do I select/subset sites with values greater than a value, but then keep all sites that contain values less than the selected value?

I have the following data (please let me know if the link doesn't work; it's my first time uploading to github): 我有以下数据(请让我知道链接是否无效;这是我第一次上传到github):

https://github.com/scottr2012/test_r_data/blob/master/2017_Annual_Averages_ALL.csv https://github.com/scottr2012/test_r_data/blob/master/2017_Annual_Averages_ALL.csv

I have some data that has values for ANC. 我有一些具有ANC值的数据。 I need to select where any of the SITES have ANC > 150, but keep all years of that SITE, even if the ANC is below 150. Currently the code below removes some of the values (and years) below 150. I need all SITES where any of the years has ANC above 150. This code currently seems to only make a list of unique sites (where ANC >150 at any point), but doesn't bring over the rest of the data. 我需要选择任何SITES的ANC> 150的地方,但是要保留该SITE的所有年份,即使ANC低于150。目前下面的代码会删除一些低于150的值(和年份)。我需要所有SITES其中任何年份的ANC都高于150。此代码目前似乎仅列出唯一站点(ANC在任何时候都大于150)的列表,但不会覆盖其余数据。

vtsss <- mydata[ which(mydata$PROGRAM=='VTSSS' & mydata$ANC >= 150), ] # Pick a subset, in this case, VTSSS

unique_vtsss <- unique(vtsss$SITE)

vtsss2 <- mydata[ which(mydata[unique_vtsss]), ] 

I get the following error: 我收到以下错误:

Error in `[.data.frame`(mydata, unique_vtsss) : 
  undefined columns selected

Here's where I subset the data but it still will remove some years with ANC less than 150. 这是我对数据进行子集处理的地方,但仍会删除ANC小于150的某些年份。

vtsss <- subset(mydata, PROGRAM == 'VTSSS' & ANC >= 150, 
select=c(PROGRAM, SITE, YEAR, ANC))

I think it should work if you replace your last line of code with 我认为如果将最后一行代码替换为

vtsss2 <- mydata[ mydata$SITE %in% unique_vtsss, ] 

?

I created a small example of data which resembles your csv and I think that subsequent code does what you are asking: 我创建了一个类似于csv的小数据示例,我认为后续代码可以满足您的要求:

PROGRAM <- c('VTSSS', 'VTSSS', 'VTSSS', 'VTSSS', 'VTSSS', 'VTSSS','VTSSS','VTSSS','other') 
SITE <- c("A", "A", "A", "B", "B", "B", "C", "C", "C") 
YEAR <- c(2018, 2019, 2020, 2018, 2019, 2020, 2018, 2019, 2020) 
ANC <- c(1, 1, 1, 160, 160, 160, 1, 160, 160)
mydata <- data.frame(PROGRAM, SITE, YEAR, ANC)

vtsss <- mydata[ which(mydata$PROGRAM =='VTSSS'), ]
vtsss2 <- vtsss[ which(vtsss$ANC >= 150), ]
vtsss2 <- subset(vtsss2, !duplicated(vtsss2$SITE))
vtsss3 <- vtsss[ which(vtsss$SITE %in% vtsss2$SITE), ]

May be we need a group_by filter 可能是我们需要一个group_by filter

library(dplyr)
mydata %>%
   group_by(SITE) %>%
   filter(any(ANC >= 150 & !is.na(ANC) &  PROGRAM %in% "VTSSS"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM