[英]Upload multiple Excel workbooks and concatanate - dcc.Upload, Plotly Dash
我正在使用 Plotly Dash 開發交互式儀表板,它將 Excel 工作簿作為輸入,將數據格式化為 pandas dataframe 並顯示為條形圖。
它適用於單個工作簿,但是當我添加一個變量以允許加載多個作品並將其連接成一個長 dataframe 並可視化時,我遇到了持久性問題。 刷新瀏覽器后數據保留的位置,即使根據文檔將storage_type
設置為memory
。
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
dfmeans = []
app.layout = html.Div([ # this code section taken from Dash docs https://dash.plotly.com/dash-core-components/upload
dcc.Store(id='stored-data', storage_type='memory'),
dcc.Upload(
id='upload-data',
children=html.Div([
'Drag and Drop or ',
html.A('Select Files')
]),
style={
'width': '100%',
'height': '60px',
'lineHeight': '60px',
'borderWidth': '1px',
'borderStyle': 'dashed',
'borderRadius': '5px',
'textAlign': 'center',
'margin': '10px'
},
# Allow multiple files to be uploaded
multiple=True
我懷疑這是因為我已經在主 function 之外聲明了列表變量df_means =[]
但這是我唯一能夠讓它工作的地方。 當我將它放在parse_contents()
function 中時,每次添加新工作簿時都會替換數據。
有沒有人成功實施了 Dash 上傳組件dcc.Upload
,將多個工作簿/excel 文件作為輸入? 我能找到的關於上傳多個文件的文檔真的很少。 完整代碼在這里 -
import base64
import datetime
import io
import re
import dash
from dash.dependencies import Input, Output, State
import dash_core_components as dcc
import dash_html_components as html
import dash_table
import plotly.express as px
import pandas as pd
from read_workbook import *
import pdb
suppress_callback_exceptions=True
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
dfmeans = []
app.layout = html.Div([ # this code section taken from Dash docs https://dash.plotly.com/dash-core-components/upload
dcc.Store(id='stored-data', storage_type='memory'),
dcc.Upload(
id='upload-data',
children=html.Div([
'Drag and Drop or ',
html.A('Select Files')
]),
style={
'width': '100%',
'height': '60px',
'lineHeight': '60px',
'borderWidth': '1px',
'borderStyle': 'dashed',
'borderRadius': '5px',
'textAlign': 'center',
'margin': '10px'
},
# Allow multiple files to be uploaded
multiple=True
),
html.Div(id='output-div'),
html.Div(id='output-datatable'),
])
def parse_contents(contents, filename, date):
content_type, content_string = contents.split(',')
decoded = base64.b64decode(content_string)
try:
workbook_xl = pd.ExcelFile(io.BytesIO(decoded))
# print(workbook_xl)
#aggregates all months data into a single data frame
def get_all_months(workbook_xl):
months = ['July', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'June']
xl_file = pd.ExcelFile(workbook_xl)
months_data = []
for month in months:
months_data.append(get_month_dataframe(xl_file, month))
print(months_data)
return pd.concat(months_data)
#run get all months function and produce behavior dataframe
df = get_all_months(workbook_xl)
#convert episode values to float and aggregate mean per shift
df['value'] = df['value'].astype(float)
dfmean = df.groupby(['Date', 'variable'],sort=False,)['value'].mean().round(2).reset_index()
dfmeans.append(dfmean)
dfmean = pd.concat(dfmeans)
except Exception as e:
print(e)
return html.Div([
'There was an error processing this file.'
])
return html.Div([
html.H5(filename),
# html.H6(datetime.datetime.fromtimestamp(date)),
dash_table.DataTable(
data=dfmean.to_dict('records'),
columns=[{'name': i, 'id': i} for i in dfmean.columns],
page_size=15
),
dcc.Store(id='stored-data', data=dfmean.to_dict('records')),
html.Hr(), # horizontal line
# For debugging, display the raw contents provided by the web browser
html.Div('Raw Content'),
html.Pre(contents[0:200] + '...', style={
'whiteSpace': 'pre-wrap',
'wordBreak': 'break-all'
})
])
@app.callback(Output('output-datatable', 'children'),
Input('upload-data', 'contents'),
State('upload-data', 'filename'),
State('upload-data', 'last_modified'))
def update_output(list_of_contents, list_of_names, list_of_dates):
if list_of_contents is not None:
children = [
parse_contents(c, n, d) for c, n, d in
zip(list_of_contents, list_of_names, list_of_dates)]
return children
@app.callback(Output('output-div', 'children'),
Input('stored-data','data'))
def make_graphs(data):
df_agg = pd.DataFrame(data)
# df_agg['Date'] = pd.to_datetime(df_agg['Date'])
if df_agg.empty:
print("Dataframe epmty")
else:
bar_fig = px.bar(df_agg, x=df_agg['Date'], y=df_agg['value'], color = 'variable',barmode='group')
return dcc.Graph(figure=bar_fig)
if __name__ == '__main__':
app.run_server(debug=True)
在 scope 回調之外定義dfmeans
肯定會使您的數據持久化,直到您終止服務器,因為它被視為全局變量。 根據Dash 文檔:
回調入門指南中解釋的 Dash 核心原則之一是 Dash 回調絕不能修改其 scope 之外的變量。修改任何全局變量都是不安全的。 本章解釋了原因並提供了一些在回調之間共享 state 的替代模式。
一種替代方法是創建一個全局存儲組件來存儲dfmeans
並將其 state 傳遞給update_output
,以便在每次上傳新文件時附加它:
@app.callback(Output('output-datatable', 'children'),
Output('global-stored-data', 'data')
Input('upload-data', 'contents'),
State('upload-data', 'filename'),
State('upload-data', 'last_modified'),
State('global-stored-data', 'data'))
def update_output(list_of_contents, list_of_names, list_of_dates, global_stored_data):
dfmeans = [pd.DataFrame(data) for data in global_stored_data]
if list_of_contents is not None:
children = [
parse_contents(c, n, d, dfmeans) for c, n, d in
zip(list_of_contents, list_of_names, list_of_dates)]
global_stored_data = [df.to_dict('records') for df in dfmeans]
return children, global_stored_data
else:
return dash.no_update
全局存儲應使用storage_type='memory'
創建,這樣當您刷新頁面時其內容不會持久存在。
話雖這么說,我注意到children
, update_output 的update_output
是html.Div()
的列表,每個都由parse_contents
返回。 但是,每個Div
的部分內容是dcc.Store(id='stored-data', data=dfmean.to_dict('records'))
,因此具有相同 id stored-data
的 dcc.Store 的多個實例是dcc.Store
同時,這不會產生錯誤嗎? 除非我誤解了你的布局,否則我認為你只有一個圖表(其中覆蓋了多個數據文件內容),所以我認為你應該修改那部分代碼以僅使用一個dcc.Store
用於連接數據,如上所示。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.