使用 dtplyr 的 group-filter-select 翻译不正确

Question

A group-filter-select is easy to perform with dplyr.使用 dplyr 可以轻松执行组过滤器选择。 In the example below, we have some data on companies for different quarters this year.在下面的例子中，我们有一些公司今年不同季度的数据。 I now want to filter to the first quarter of companies which don't have data for the fourth quarter (in this case, the second company), dropping the quarter-label.我现在想过滤到第一季度没有第四季度数据的公司（在本例中是第二家公司），去掉季度标签。

df <- data.frame(companyId = c(rep(1, 4),
                               rep(2, 3),
                               rep(3, 4)),
                 Quarter = c(1:4, 1:3, 1:4),
                 Year = 2019)

q <- 4                 

df %>%
  group_by(
    companyId,
  ) %>%
  filter(
    Quarter == 1 &
      !(q %in% Quarter)
  ) %>%
  select(companyId,
         Year)

> # A tibble: 1 x 3
> # Groups:   companyId, Ticker [1]
>   companyId  Year
>       <dbl> <dbl>
> 1         2  2019

However, doing the same with dtplyr returns an empty table:但是，对 dtplyr 执行相同操作会返回一个空表：

dt <- lazy_dt(data.table(companyId = c(rep(1, 4),
                                       rep(2, 3),
                                       rep(3, 4)),
                         Quarter = c(1:4, 1:3, 1:4),
                         Year = 2019))

q <- 4

dt %>%
  group_by(
    companyId
  ) %>%
  filter(
    Quarter == 1 &
      !(q %in% Quarter)
  ) %>%
  select(companyId
         Year)

> Source: local data table [?? x 3]
> Call:   `_DT1`[Quarter == 1 & !(q %in% Quarter), .(companyId, 
>     Year)]
> 
> # ... with 3 variables: companyId <dbl>, Year <dbl>
> 
> # Use as.data.table()/as.data.frame()/as_tibble() to access results

What's odd is the displayed translation:奇怪的是显示的翻译：

`_DT1`[Quarter == 1 & !(q %in% Quarter),
       .(companyId, Year)]

which is incorrect.这是不正确的。 As described in the dtplyr'sown docs , the correct call would need to use a filtered .SD :如 dtplyr'sown docs中所述，正确的调用需要使用过滤的.SD ：

`_DT1`[, .SD[Quarter == 1 & !(q %in% Quarter)],
       by = .(companyId),
       .SDcols = c("Year")]

(the by-columns are automatically included, so .SDcols should omit them to avoid duplication) （副列会自动包含在内，因此.SDcols应省略它们以避免重复）

Interestingly, if we omit the select , the translation (and therefore output) is correct:有趣的是，如果我们省略select ，则翻译（因此输出）是正确的：

dt %>%
  group_by(
    companyId
  ) %>%
  filter(
    Quarter == 1 &
      !(q %in% Quarter)
  )

> Source: local data table [?? x 4]
> Call:   `_DT2`[, .SD[Quarter == 1 & !(q %in% Quarter)], 
>     keyby = .(companyId)]
> 
>   companyId Quarter  Year
>       <dbl>   <int> <dbl>
> 1         2       1  2019

Therefore, as a workaround, I can perform an as.data.table() prior to the select .因此，作为一种解决方法，我可以在select之前执行as.data.table() 。 This works, but throws an annoying warning:这可行，但会引发烦人的警告：

dt %>%
  group_by(
    companyId
  ) %>%
  filter(
    calendarQuarter == 1 &
      !(q %in% calendarQuarter)
  ) %>%
  as.data.table() %>%
  select(companyId,
         calendarYear)

>    companyId calendarYear
> 1:         2         2019
> Warning message:
> You are using a dplyr method on a raw data.table, which will call the data frame implementation,
> and is likely to be inefficient.
> * 
> * To suppress this message, either generate a data.table translation with `lazy_dt()` or convert
> * to a data frame or tibble with `as.data.frame()`/`as_tibble()`.

I have a hard time thinking this is expected behavior, but would like to check here before throwing this on the dtplyr Github tracker.我很难认为这是预期的行为，但想在将其dtplyr Github 跟踪器之前检查这里。

Answer 1

This is currently a bug in dtplyr .这目前是dtplyr中的一个错误。 I have posted it to the package's Github .我已将其发布到包裹的 Github 。

使用 dtplyr 的 group-filter-select 翻译不正确

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-12-26 12:40:06

使用 dtplyr 的 group-filter-select 翻译不正确

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-12-26 12:40:06

解决方案1
0 已采纳 2019-12-26 12:40:06