上傳多個 Excel 工作簿並連接 - dcc.Upload, Plotly Dash

Question

我正在使用 Plotly Dash 開發交互式儀表板，它將 Excel 工作簿作為輸入，將數據格式化為 pandas dataframe 並顯示為條形圖。

它適用於單個工作簿，但是當我添加一個變量以允許加載多個作品並將其連接成一個長 dataframe 並可視化時，我遇到了持久性問題。 刷新瀏覽器后數據保留的位置，即使根據文檔將storage_type設置為memory 。

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

dfmeans = []

app.layout = html.Div([ # this code section taken from Dash docs https://dash.plotly.com/dash-core-components/upload
    dcc.Store(id='stored-data', storage_type='memory'),
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Allow multiple files to be uploaded
        multiple=True

我懷疑這是因為我已經在主 function 之外聲明了列表變量df_means =[]但這是我唯一能夠讓它工作的地方。 當我將它放在parse_contents() function 中時，每次添加新工作簿時都會替換數據。

有沒有人成功實施了 Dash 上傳組件dcc.Upload ，將多個工作簿/excel 文件作為輸入？ 我能找到的關於上傳多個文件的文檔真的很少。 完整代碼在這里 -

import base64
import datetime
import io
import re

import dash
from dash.dependencies import Input, Output, State
import dash_core_components as dcc
import dash_html_components as html
import dash_table
import plotly.express as px

import pandas as pd
from read_workbook import *

import pdb

suppress_callback_exceptions=True

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

dfmeans = []

app.layout = html.Div([ # this code section taken from Dash docs https://dash.plotly.com/dash-core-components/upload
    dcc.Store(id='stored-data', storage_type='memory'),
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Allow multiple files to be uploaded
        multiple=True
    ),
    html.Div(id='output-div'),
    html.Div(id='output-datatable'),
])

def parse_contents(contents, filename, date):
    content_type, content_string = contents.split(',')
    
    decoded = base64.b64decode(content_string)
    try:
        workbook_xl = pd.ExcelFile(io.BytesIO(decoded))
        # print(workbook_xl)
        
        #aggregates all months data into a single data frame
        def get_all_months(workbook_xl):
            months = ['July', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'June']
            xl_file = pd.ExcelFile(workbook_xl)
            
            months_data = []
            for month in months:
                months_data.append(get_month_dataframe(xl_file, month))
                print(months_data)
            return pd.concat(months_data)
        
        #run get all months function and produce behavior dataframe 
        df = get_all_months(workbook_xl)

        #convert episode values to float and aggregate mean per shift 
        df['value'] = df['value'].astype(float)
        dfmean = df.groupby(['Date', 'variable'],sort=False,)['value'].mean().round(2).reset_index()
        dfmeans.append(dfmean)
        dfmean = pd.concat(dfmeans)

    
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return html.Div([
        html.H5(filename),
        # html.H6(datetime.datetime.fromtimestamp(date)),
        
        dash_table.DataTable(
            data=dfmean.to_dict('records'),
            columns=[{'name': i, 'id': i} for i in dfmean.columns],
            page_size=15
        ),
        dcc.Store(id='stored-data', data=dfmean.to_dict('records')),
        
        html.Hr(),  # horizontal line

        # For debugging, display the raw contents provided by the web browser
        html.Div('Raw Content'),
        html.Pre(contents[0:200] + '...', style={
            'whiteSpace': 'pre-wrap',
            'wordBreak': 'break-all'
        })
    ])

@app.callback(Output('output-datatable', 'children'),
              Input('upload-data', 'contents'),
              State('upload-data', 'filename'),
              State('upload-data', 'last_modified'))

def update_output(list_of_contents, list_of_names, list_of_dates):
    
    if list_of_contents is not None:
        children = [
            parse_contents(c, n, d) for c, n, d in
            zip(list_of_contents, list_of_names, list_of_dates)]
        return children


@app.callback(Output('output-div', 'children'),
              Input('stored-data','data'))

def make_graphs(data):
    
    df_agg = pd.DataFrame(data)
    
    # df_agg['Date'] = pd.to_datetime(df_agg['Date'])
    
    if df_agg.empty:
        print("Dataframe epmty")
    else:
        bar_fig = px.bar(df_agg, x=df_agg['Date'], y=df_agg['value'], color = 'variable',barmode='group')
        return dcc.Graph(figure=bar_fig)
    
if __name__ == '__main__':
    app.run_server(debug=True)

Answer 1

在 scope 回調之外定義dfmeans肯定會使您的數據持久化，直到您終止服務器，因為它被視為全局變量。 根據Dash 文檔：

回調入門指南中解釋的 Dash 核心原則之一是 Dash 回調絕不能修改其 scope 之外的變量。修改任何全局變量都是不安全的。 本章解釋了原因並提供了一些在回調之間共享 state 的替代模式。

一種替代方法是創建一個全局存儲組件來存儲dfmeans並將其 state 傳遞給update_output ，以便在每次上傳新文件時附加它：

@app.callback(Output('output-datatable', 'children'),
          Output('global-stored-data', 'data')
          Input('upload-data', 'contents'),
          State('upload-data', 'filename'),
          State('upload-data', 'last_modified'),
          State('global-stored-data', 'data'))

def update_output(list_of_contents, list_of_names, list_of_dates, global_stored_data):
    dfmeans = [pd.DataFrame(data) for data in global_stored_data]
    if list_of_contents is not None:
        children = [
            parse_contents(c, n, d, dfmeans) for c, n, d in
            zip(list_of_contents, list_of_names, list_of_dates)]
        global_stored_data = [df.to_dict('records') for df in dfmeans]
        return children, global_stored_data
    else:
        return dash.no_update

全局存儲應使用storage_type='memory'創建，這樣當您刷新頁面時其內容不會持久存在。

話雖這么說，我注意到children ， update_output 的update_output是html.Div()的列表，每個都由parse_contents返回。 但是，每個Div的部分內容是dcc.Store(id='stored-data', data=dfmean.to_dict('records')) ，因此具有相同 id stored-data的 dcc.Store 的多個實例是dcc.Store同時，這不會產生錯誤嗎？除非我誤解了你的布局，否則我認為你只有一個圖表（其中覆蓋了多個數據文件內容），所以我認為你應該修改那部分代碼以僅使用一個dcc.Store用於連接數據，如上所示。

上傳多個 Excel 工作簿並連接 - dcc.Upload, Plotly Dash

問題描述

1 個解決方案

解決方案1
0 2023-02-01 13:08:00

上傳多個 Excel 工作簿並連接 - dcc.Upload, Plotly Dash

問題描述

1 個解決方案

解決方案1 0 2023-02-01 13:08:00

解決方案1
0 2023-02-01 13:08:00