简体   繁体   中英

More speed for BigQuery data in Shiny

I don't like to access a complete public BigQuery dataset in my case geo_us_boundaries . But I'd like to choose the data by state using the query glue::glue_sql("SELECT * FROM states WHERE state = {x}", x = input$selectedvariable1, .con=bq_con) . I try to do:


library(dplyr)
library(ggplot2)
library(bigrquery)
library(DBI)
library(sf)
library(glue)

# Open a public BigQuery dataset eg. "geo_us_boundaries"
bq_con <- dbConnect(
  bigrquery::bigquery(),
  project = "bigquery-public-data",
  dataset = "geo_us_boundaries",
  billing = "my-project"
)
bigrquery::dbListTables(bq_con) # List all the tables in BigQuery data set


# Take the table
dataset <- dplyr::tbl(bq_con, 
                      "states") # connects to a table


# Enumerate the states
dataset_vars <- dataset %>% dplyr::distinct(geo_id, state, state_name)%>% 
  collect() 
str(dataset_vars)


# Create the shiny dash
ui <- fluidPage(
  titlePanel(title="States Dashboard"),  
  sidebarLayout(
    sidebarPanel(
      selectInput(inputId = "selectedvariable0",
                  label = "Geo ID", 
                  choices = c(unique(dataset_vars$geo_id)),selected = TRUE ), 
      selectInput(inputId = "selectedvariable1",
                  label = "State", 
                  choices = c(unique(dataset_vars$state)),selected = TRUE ), 
      selectInput(inputId = "selectedvariable2",
                  label = "State name", 
                  choices = c(unique(dataset_vars$state_name)),selected = TRUE )
    ),
    mainPanel(
      fluidRow(
        splitLayout(plotOutput("myplot")))
      
    )
  )
)
server <- function(input, output){
  
  # # Selection of variables for plots constructions
  
  sqlInput <- reactive({
    glue::glue_sql("SELECT * FROM states WHERE state = {x}", x = input$selectedvariable1, .con=bq_con)
  })
  stands_sel <- function() dbGetQuery(bq_con, as.character(sqlInput()), stringsAsFactors = T)
  
  print(sqlInput)

  observe({
    
    output$myplot <- renderPlot({
      
      #Create the plot 
      stands_sel_sf <- st_as_sf(stands_sel(), wkt = "state_geom", crs = 4326) 
      ggplot() + geom_sf(data=stands_sel_sf) }) 
  }) #end of observe function.
}
shinyApp(ui, server)
#

But this type of data access by state selection is very slow and in my real-world problem I have big geometries and I spend several minutes for downloading from BQ and for plots finished too. There is some approach for increasing the speed of this operation?

Please any help with it?

Thank in advance!

Are you sure the time spent downloading the data, and not eg rendering it? The query takes couple seconds, and even the largest state by geometry text size (Texas) is only 1.3 MB. This should not take that minutes to download.

Anyway, a typical solution is to simplify the state geometry:

SELECT * EXCEPT(state_geom), ST_Simplify(state_geom, 1000) AS state_geom 
FROM states WHERE state = {x}

This makes the heaviest state (now Alaska) is under 40KB size, and should be quicker to download and to draw.

Note that this performs ST_Simplify computation every time in the query, which is somewhat expensive. You'd probably want to do it only once - create table with simplified state geometry and read from it instead - this makes query itself much cheaper and faster too, as the query reads less data, and does not have to compute simplified geometry.

CREATE TABLE tmp.states AS
SELECT * EXCEPT(state_geom), ST_Simplify(state_geom, 1000) AS state_geom
FROM  `bigquery-public-data`.geo_us_boundaries.states;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM