简体   繁体   中英

DuckDB R : Calculate mean and median for multiple columns

I have a duckdb and want to calculate the means and median or multiple columns at once:

eg

#This works:
mtcars %>%
summarise(across(everything(),list(mean, median))

#This doesn't
tbl(con,"mtcars")%>%
summarise(across(everything(),list(mean, median))

My guess is that this is caused by how dbplyr does translation. It captures and translates the contents of each dplyr command. So when you call summarise(across(everything(), list(mean, median))) then across(everything(), list(mean, median)) gets passed to the translator (which fails to translate it as intended).

If across(everything(), list(mean, median)) was turned into one line of code for each variable (eg var1 = mean(var1)... var100 = median(var100) ) then these multiple lines could be correctly translated by dbplyr.

Perhaps later versions of dbplyr can convert across(.) into multiple lines of code prior to translating as @user63230's comment implies.

We should be able to do this manually following the method in this answer or this answer. Something like the following:

library(rlang)

c_names = colnames(remote_table)
patterns <- parse_exprs(paste(c_names, ' = mean(', c_names, ')'))

remote_table %>%
  summarise(!!!patterns)

The idea is to build text strings of the commands we want executed, turn these into expressions using parse_exprs , and lastly to unquo them in the dplyr call ( !!! appears to evaluate before dbplyr translation).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM