简体   繁体   中英

What is wrong with Bokeh image() plotting? It succeed but showed no graph

I have initially a Spark dataframe with data like that:

+-------------------+--------------+------+-----+
|window_time        |delayWindowEnd|values|index|
+-------------------+--------------+------+-----+
|2022-01-24 18:00:00|999           |999   |2    |
|2022-01-24 19:00:00|999           |999   |1    |
|2022-01-24 20:00:00|999           |999   |3    |
|2022-01-24 21:00:00|999           |999   |4    |
|2022-01-24 22:00:00|999           |999   |5    |
|2022-01-24 18:00:00|998           |998   |4    |
|2022-01-24 19:00:00|998           |998   |5    |
|2022-01-24 20:00:00|998           |998   |3    |

and I'd like to plot that as a heatmap with the following code in Apache Zeppelin:

%spark.pyspark

import bkzep
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, ColorBar, LogColorMapper
from bokeh.layouts import gridplot
from pyspark.sql.functions import col, coalesce, lit, monotonically_increasing_id
from pyspark.sql import DataFrame
from pyspark.sql.functions import *

output_notebook(notebook_type='zeppelin')

then

%pyspark

from pyspark.sql.functions import *

def plot_summaries(sensor, dfName):
    df = sqlContext.table(dfName)
    pdf = df.toPandas()
    source = ColumnDataSource(pdf)

    color_mapper = LogColorMapper(palette="Viridis256", low=1, high=10)

    plot = figure(toolbar_location=None,x_axis_type='datetime')
    plot.image(x='window_time', y='delayWindowEnd', source=source, image='index',dw=1,dh=1,  color_mapper=color_mapper)

    color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12)

    plot.add_layout(color_bar, 'right')
    show(gridplot([plot], ncols=1, plot_width=1000, plot_height=400))

sensors = [   
"all"
]

and then finally

%pyspark

from pyspark.sql.functions import *

keyCol = "month_day_hour"

sensors = [
    "all"]


for sensor in sensors:
    plot_summaries(sensor, "maxmin2")   

The latest one has been succeed, but I see no graph.

在这里,我希望绘制图表,如果它不是上面代码的热图,则绘制它

That's probably because of parameters misuse.

Is it ok to use dataframe column as image parameter (while other twos will be x and y axis). Are df and dw correctly initialized? It is ok to have X axis being a timestamp?

If the reason is browser rendering, there is a JS error like below:

polyfills.d42c9551b0788083cd69.js:1 Uncaught Error: Error rendering Bokeh model: could not find #fb19be38-e25a-4ebf-a488-593cd2e9a4d6 HTML tag
    at o (bokeh-1.3.4.min.js:31:143801)
    at Object.n._resolve_root_elements (bokeh-1.3.4.min.js:31:144274)
    at Object.n.embed_items_notebook (bokeh-1.3.4.min.js:31:147281)
    at embed_document (<anonymous>:6:20)
    at <anonymous>:15:9
    at e.invokeTask (polyfills.d42c9551b0788083cd69.js:1:8063)
    at t.runTask (polyfills.d42c9551b0788083cd69.js:1:3241)
    at t.invokeTask (polyfills.d42c9551b0788083cd69.js:1:9170)
    at i.useG.invoke (polyfills.d42c9551b0788083cd69.js:1:9061)
    at n.args.<computed> (polyfills.d42c9551b0788083cd69.js:1:38948)

While the responce from Zeppelin backend with the execution and plotting results, reached the browser through websocket app, looks pretty and rather correct:

https://pastebin.com/pLWBA8Cv

The answer was given here: https://discourse.bokeh.org/t/cant-render-heatmap-data-for-apache-zeppelins-pyspark-dataframe/8844

The explanation what was wrong is explained in details are at the link above. Shortly, I wasn't applying the needed 2D array to the Bokeh and I had to produce it with pandas ' pivot and numpy . Here is the solution:

dft = sqlContext.table(dfName)
pdf = dft.toPandas()
import pandas as pd
rowIDs = pdf['values']
colIDs = pdf['window_time']

A = pdf.pivot_table('index', 'values', 'window_time', fill_value=0)
source = ColumnDataSource(data={'x':[pd.to_datetime('Jan 24 2022')] #left most
                           ,'y':[0] #bottom most 
                           ,'dw':[pdf['window_time'].max()-pdf['window_time'].min()] #TOTAL width of image
                           #,'dh':[df['delayWindowEnd'].max()] #TOTAL height of image
                           ,'dh':[1000] #TOTAL height of image
                           ,'im':[A.to_numpy()] #2D array using to_numpy() method on pivotted df
                           })

color_mapper = LogColorMapper(palette="Viridis256", low=1, high=20)

plot = figure(toolbar_location=None,x_axis_type='datetime')
plot.image(x='x', y='y', source=source, image='im',dw='dw',dh='dh',  color_mapper=color_mapper)

color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12)

plot.add_layout(color_bar, 'right')
show(gridplot([plot], ncols=1, plot_width=1000, plot_height=400)) 

The result looks pretty for me:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM