简体   繁体   中英

How can I cache my SQL result so I don't have to call SQL repeatedly to get data for Dash plots?

I am trying to build a dashboard that will generate several plots based on a single SQL data query. I want the query to be modifiable via the dashboard (eg to query a different order amount or similar), and then change all plots at once. The query maybe expensive so I don't want it to run N times for N different plots.

I have tried to do this using the flask cache decorator @cache.memoize() , similar to the example given in the docs: https://dash.plotly.com/performance

Here is a stripped back version of what I'm doing. I can tell that the query_data function is not doing what I intend because: 1. the resulting graphs show different data points on the x-axis. If it was using the same cached dataset the data points in x should be the same 2. The print statements in the query_data function come out twice everytime I change an input cell.

Can anyone explain why this isn't working or how I can achieve what I want.

import sys

import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
from dash.dependencies import Input, Output

from setup_redshift import setup_connection
from flask_caching import Cache
from datetime import datetime
import pandas as pd

conn = setup_connection()

app = dash.Dash(__name__)
cache = Cache(app.server, config={
    # 'CACHE_TYPE': 'filesystem',
    'CACHE_TYPE': 'memcached',
    'CACHE_DIR': 'cache-directory'
})

sql_query = '''select i.order_amount_in_usd, r.calibrated_score, r.score
from datalake.investigations i 
inner join datalagoon.prod_model_decision r
                    ON i.investigation_id = r.investigation_id

where i.team_id = {}
AND i.order_amount_in_usd < {}
AND r.calibrated_score >= 0
order by RANDOM()
limit 1000'''


@cache.memoize()
def query_data(team_id, max_usd):
    print("Calling data query now with team_id={} and max_usd={} at time {}".format(team_id, max_usd, datetime.now()))
    _sql = sql_query.format(team_id, max_usd)
    print(_sql)
    data = pd.read_sql(sql_query.format(team_id, max_usd), conn)
    print("data is {} rows ".format(len(data)))
    print("data max usd is {}".format(data['order_amount_in_usd'].max()))
    return data


@app.callback(Output(component_id='output-graph', component_property='figure'),
              [Input(component_id='data-select-team-id', component_property='value'),
               Input(component_id='data-select-max-usd', component_property='value')])
def plot_data(team_id, max_usd):
    print("calling query_data at from graph at {}".format(datetime.now()))
    in_data = query_data(team_id, max_usd)
    print("going to make graph1 now at {}".format(datetime.now()))

    fig = px.scatter(in_data,
                     x='order_amount_in_usd',
                     y='calibrated_score')
    return fig


@app.callback(Output(component_id='output-graph2', component_property='figure'),
              [Input(component_id='data-select-team-id', component_property='value'),
               Input(component_id='data-select-max-usd', component_property='value')])
def plot_second_data(team_id, max_usd):
    print("calling query_data at from graph2 at {}".format(datetime.now()))
    in_data = query_data(team_id, max_usd)
    print("going to make graph2 now at {}".format(datetime.now()))
    fig = px.scatter(in_data,
                     x='order_amount_in_usd',
                     y='score')
    return fig


app.layout = html.Div(  # style={'backgroundColor': colors['background']},

    children=[dcc.Input(id='data-select-team-id',
                        value=7625,
                        placeholder='Input Team ID',
                        type='number',
                        min=0,
                        max=1_000_000_000,
                        debounce=True
                        ),
              dcc.Input(id='data-select-max-usd',
                        value=5000,
                        type='number',
                        debounce=True),
              dcc.Graph(id='output-graph'),
              dcc.Graph(id='output-graph2')]
)

if __name__ == '__main__':
    app.run_server(debug=True)

In the past Ive stored the results using dcc.Store (see here )

You could structure your app like this:

  • Run the SQL query and store the results using dcc.Store (local or memory depending on your use case). This only runs once (per app load, interval timer or user button refresh etc)
  • Callbacks to generate different cuts of the data in dash tables or charts would load the store

If the results of the query are large (see 'Storage Limitations; in the above link) then you should save the results to a local flat file such as JSON or CSV and read that each time.

An alternative is to use PostgreSQL and materialized views to make the SQL query cheap (with a trade off on storage space)

These approaches makes the dash app appear very responsive to the user while allowing the analysis of large data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM