简体   繁体   English

遍历行并计算与R中的多个条件匹配的行数

[英]Loop through rows and count number of rows that matches multiple criteria in R

I have a dataset that looks like this: 我有一个看起来像这样的数据集:

        city period_day       date 
1  barcelona    morning 2017-01-15         
2  sao_paulo  afternoon 2016-12-07         
3  sao_paulo    morning 2016-11-16         
4  barcelona    morning 2016-11-06         
5  barcelona  afternoon 2016-12-31         
6  sao_paulo  afternoon 2016-11-30         
7  barcelona    morning 2016-10-15         
8  barcelona  afternoon 2016-11-30         
9  sao_paulo  afternoon 2016-12-24         
10 sao_paulo  afternoon 2017-02-02         

For every row, I want to count how many rows have an older date than the date of the row, both for city and period_day. 对于每一行,我想计算有多少行的日期早于该行的日期(对于city和period_day)。 In this case, I want this result: 在这种情况下,我想要这样的结果:

        city period_day       date row_count
1  barcelona    morning 2017-01-15         2
2  sao_paulo  afternoon 2016-12-07         1
3  sao_paulo    morning 2016-11-16         0
4  barcelona    morning 2016-11-06         1
5  barcelona  afternoon 2016-12-31         1
6  sao_paulo  afternoon 2016-11-30         0
7  barcelona    morning 2016-10-15         0
8  barcelona  afternoon 2016-11-30         0
9  sao_paulo  afternoon 2016-12-24         2
10 sao_paulo  afternoon 2017-02-02         3

When row_count equals to 0, it means that it's the older date. 当row_count等于0时,表示它是较旧的日期。

I came up with a solution, but it took too long with more data. 我想出了一个解决方案,但是花了太多时间来处理更多数据。 That's the code: 那是代码:

get_count_function <- function(df) {
  idx <- 1:nrow(df)

  count <- sapply(idx, function(x) {
    name_city <-
      df %>% select(city) %>% filter(row_number() == x) %>% pull()
    name_period <-
      df %>% select(period_day) %>% filter(row_number() == x) %>% pull()

    date_row <- df %>%
      select(date) %>%
      filter(row_number() == x) %>%
      pull()

    date_any_row <- df %>%
      filter(dplyr::row_number() != x,
             city == name_city,
             period_day == name_period) %>%
      select(date) %>%
      pull()

    how_many <- sum(date_row > date_any_row)

    return(how_many)

  })

  return(count)

}

How could I turn this function more efficient? 如何提高此功能的效率?

Try this one: 试试这个:

library(tidyverse)

dat %>%
  group_by(city, period_day) %>%
  mutate(row_count = order(date) - 1) %>%
  ungroup()

When you call order it returns indices, pointing to the order of the value in a selected group of values ( date ). 调用order它返回索引,指向选定值组( date )中值的顺序 Subtracting 1 from the indices, you obtain the count of values preceding current value, in a particular group. 从索引中减去1 ,可以得出特定组中当前值之前的计数 Eg if it is the min. 例如,如果这是分钟。 value in a group, it has index 1 , so nothing preceding it ( 1 - 1 = 0 ), if the index is 2 - only one value is preceding it (one older date before it) etc. 值在一个组中,它的索引为1 ,因此在索引的前面没有任何值( 1 - 1 = 0 ),如果索引为2仅在它前面有一个值(在它之前一个较早的date )等

Data: 数据:

dat <- read.table(
  text = "        city period_day       date
  barcelona    morning 2017-01-15
  sao_paulo  afternoon 2016-12-07
  sao_paulo    morning 2016-11-16
  barcelona    morning 2016-11-06
  barcelona  afternoon 2016-12-31
  sao_paulo  afternoon 2016-11-30
  barcelona    morning 2016-10-15
  barcelona  afternoon 2016-11-30
  sao_paulo  afternoon 2016-12-24
  sao_paulo  afternoon 2017-02-02",
  header = T,
  colClasses = c("character", "character", "Date")
)

This should work if you are willing to use the data.table package: 如果您愿意使用data.table包,这应该可以工作:

library(data.table)

dat <- read.table(header=T, row.names=1, text="
        city period_day       date 
1  barcelona    morning 2017-01-15         
2  sao_paulo  afternoon 2016-12-07         
3  sao_paulo    morning 2016-11-16         
4  barcelona    morning 2016-11-06         
5  barcelona  afternoon 2016-12-31         
6  sao_paulo  afternoon 2016-11-30         
7  barcelona    morning 2016-10-15         
8  barcelona  afternoon 2016-11-30         
9  sao_paulo  afternoon 2016-12-24         
10 sao_paulo  afternoon 2017-02-02   
")

dat <- as.data.table(dat)

dat[, row_count := (order(as.Date(date)) - 1), by=.(city, period_day)]

# Check
dat

##          city period_day       date row_count
##  1: barcelona    morning 2017-01-15         2
##  2: sao_paulo  afternoon 2016-12-07         1
##  3: sao_paulo    morning 2016-11-16         0
##  4: barcelona    morning 2016-11-06         1
##  5: barcelona  afternoon 2016-12-31         1
##  6: sao_paulo  afternoon 2016-11-30         0
##  7: barcelona    morning 2016-10-15         0
##  8: barcelona  afternoon 2016-11-30         0
##  9: sao_paulo  afternoon 2016-12-24         2
## 10: sao_paulo  afternoon 2017-02-02         3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM