简体   繁体   English

R使用向量中的项循环过滤数据

[英]R Filtering data in loop using items in vector

I am attempting to speed some process up, so that I do not have to edit the year manually each time. 我正在尝试加快一些过程,因此不必每次都手动编辑年份。 I am certainly no expert in R, not sure if this is too easy to ask. 我当然不是R方面的专家,不确定这是否太容易问到。 It seems like it works for the first item in the vector, since the result is right. 似乎它适用于向量中的第一项,因为结果正确。

I would like to print the amount of rows in a filtered dataset, for each year. 我想每年打印一次过滤后的数据集中的行数。 library(dplyr) 图书馆(dplyr)

getData <- function(){
  data <- read.csv("data.csv", stringsAsFactors=FALSE)
}

data <- getData()
years <- c("2010", "2011", "2012", "2013", "2014", "2015", "2016")
nbh <- "SomeVar"

for(year in years){
  data <- filter(data, grepl(year, Created.Date) & grepl(nbh, SomeColumn))
  print(nrow(data))
}

However, it just outputs this, where the first one is correct: 但是,它只输出第一个是正确的:

[1] 2
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0

Is this because it filters the data the first time, it then has only two records, which causes for the next ones to be 0? 这是因为它第一次过滤数据,然后只有两个记录,导致下一个记录为0吗?

You can probably modify this to fit your bill. 您可能可以修改它以适合您的账单。

library(dplyr)

xy <- data.frame(letters = sample(letters, 100, replace = TRUE),
                 years = sample(seq(from = 2010, to = 2015, by = 1), size = 100, replace = TRUE),
                 values = rnorm(100))

xy %>%
  group_by(years) %>%
  filter(letters %in% c("a", "b", "c")) %>%
  count()

# A tibble: 6 × 2
  years     n
  <dbl> <int>
1  2010     5
2  2011     2
3  2012     3
4  2013     1
5  2014     1
6  2015     3

You're overwriting your dataset in your for loop. 您正在覆盖for循环中的数据集。 Try 尝试

for(year in years){
  data_temp <- filter(data, grepl(year, Created.Date) & grepl(nbh, SomeColumn))
  print(nrow(data_temp))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM