簡體   English   中英

在 dtplyr vs dplyr 中選擇和分組多列

[英]Selecting and grouping multiple columns in dtplyr vs dplyr

我想在lapply循環中對group_byacross幾個變量進行dtplyr ,我發現在調用lazy_dt()之后我無法使用與dplyr相同的語法。

library(dplyr)
mycolumns= c("Wind", "Month", "Ozone", "Solar.R")
columnpairs <- as.data.frame(combn(mycolumns, 2))

#         V1    V2      V3    V4      V5      V6
#    1  Wind  Wind    Wind Month   Month   Ozone
#    2 Month Ozone Solar.R Ozone Solar.R Solar.R

result_dplyr <- lapply(columnpairs, function(x) {
  airquality %>% 
    select(all_of(x)) %>% 
    group_by(across(all_of(x))) %>% filter(n() > 1)
  }
)

$V1
# A tibble: 105 x 2
# Groups:   Wind, Month [40]
    Wind Month
   <dbl> <int>
 1   7.4     5
 2   8       5
 3  11.5     5
 4  14.9     5
 5   8.6     5
 6   8.6     5
 7   9.7     5
 8  11.5     5
 9  12       5
10  11.5     5
# ... with 95 more rows

使用相同的語法,我在使用dtplyr調用lazy_dt后遇到問題。

library(dtplyr)
airq <- lazy_dt(airquality)

lapply(columnpairs, function(x) {
  airq %>% select(all_of(x)) %>% 
    group_by(across(all_of(x))) %>% filter(n() > 1)
})

Error in `all_of()`:
! object 'x' not found

任何想法?

編輯:在https://github.com/tidyverse/dtplyr/issues/383創建的問題

似乎group_bydtplyr ( group_by.dtplyr_step ) 的方法正在產生問題。

> methods('group_by')
[1] group_by.data.frame*  group_by.data.table*  group_by.dtplyr_step*

不確定它是否是一個錯誤。

> traceback()
...
6: group_by.dtplyr_step(., across(all_of(.x)))  ###
5: group_by(., across(all_of(.x)))
4: filter(., n() > 1)
3: airq %>% select(all_of(.x)) %>% group_by(across(all_of(.x))) %>% 
       filter(n() > 1)
2: .f(.x[[i]], ...)
1: map(columnpairs, ~airq %>% select(all_of(.x)) %>% group_by(across(all_of(.x))) %>% 
       filter(n() > 1))

這是兩種有效的方法

  1. 使用已棄用的group_by_at
  2. 轉換為syms然后評估 ( !!! )
使用group_by_at
library(dtplyr)
library(purrr)
library(dplyr)
map(columnpairs, ~ airq %>%
        select(all_of(.x)) %>%
        group_by_at(all_of(.x)) %>%
        filter(n() > 1))
$V1
Source: local data table [105 x 2]
Groups: Wind, Month
Call:
  _DT2 <- `_DT1`[, .(Wind, Month)]
  `_DT2`[`_DT2`[, .I[.N > 1], by = .(Wind, Month)]$V1]

   Wind Month
  <dbl> <int>
1   7.4     5
2   7.4     5
3   8       5
4   8       5
5  11.5     5
6  11.5     5
# … with 99 more rows
...

轉換為符號並計算
map(columnpairs, ~ airq %>% 
      select(all_of(.x)) %>%
      group_by(!!! rlang::syms(.x)) %>% 
      filter(n() > 1))
$V1
Source: local data table [105 x 2]
Groups: Wind, Month
Call:
  _DT20 <- `_DT1`[, .(Wind, Month)]
  `_DT20`[`_DT20`[, .I[.N > 1], by = .(Wind, Month)]$V1]

   Wind Month
  <dbl> <int>
1   7.4     5
2   7.4     5
3   8       5
4   8       5
5  11.5     5
6  11.5     5
# … with 99 more rows

# Use as.data.table()/as.data.frame()/as_tibble() to access results

$V2
...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM