简体   繁体   English

散景不能用SQL数据解释正确的比例

[英]Bokeh not interpreting correct scale with SQL data

Explored lots of various solutions here but not finding one that works. 在这里探索了许多各种解决方案,但没有找到可行的解决方案。 I'm using sqlite and pandas to read data from a SQL database, but Bokeh doesn't like the date. 我正在使用sqlite和pandas从SQL数据库读取数据,但Bokeh不喜欢日期。 I've tried conversions to datetime, unixepoch, etc. and they all seem to yield the same result. 我尝试了转换为日期时间,unixepoch等,它们似乎都产生了相同的结果。

EDIT: Here's the full code: 编辑:这是完整的代码:

from os.path import dirname, join

import pandas as pd
import pandas.io.sql as psql
import numpy as np
import sqlite3
import os

from math import pi

from bokeh.plotting import figure, output_file, show 
from bokeh.io import output_notebook, curdoc
from bokeh.models import ColumnDataSource, Div, DatetimeTickFormatter
from bokeh.models.widgets import Slider, Select, RadioButtonGroup
from bokeh.layouts import layout, widgetbox

import warnings
import datetime

warnings.filterwarnings('ignore')


## Set up the SQL Connection

conn = sqlite3.connect('/Users/<>/Documents/python_scripts/reptool/reptool_db')
c = conn.cursor()

## Run the SQL

proj = pd.read_sql(
                       """

                        SELECT  

                        CASE WHEN df is null THEN ds ELSE df END AS 'projdate',
                        CASE WHEN yhat is null THEN y ELSE yhat END AS 'projvol',
                        strftime('%Y',ds) as 'year'

                        FROM forecast 

                        LEFT JOIN actuals 
                        ON forecast.ds = actuals.df 

                       """, con=conn)


# HTML index page and inline CSS stylesheet

desc = Div(text=open("/Users/<>/Documents/python_scripts/reptool/description.html").read(), width=800)


## Rename Columns and create list sets

proj.rename(columns={'projdate': 'x', 'projvol': 'y'}, inplace=True)

x=list(proj['x'])
y=list(proj['y'])

# proj['projdate'] = [datetime.datetime.strptime(x, "%Y-%m-%d").date() for x in proj['projdate']]

# Create input controls


radio_button_group = RadioButtonGroup(
        labels=["Actuals", "Forecast","FY Projection"], active=0)

min_year = Slider(title="Period Start", start=2012, end=2018, value=2013, step=1)
max_year = Slider(title="Period End", start=2012, end=2018, value=2017, step=1)

## Declare systemic source

source = ColumnDataSource(data=dict(x=[], y=[], year=[]))


## Bokeh tools

TOOLS="pan,wheel_zoom,box_zoom,reset,xbox_select"

## Set up plot


p = figure(title="REP Forecast", plot_width=900, plot_height=300, tools=TOOLS, x_axis_label='date', x_axis_type='datetime', y_axis_label='volume', active_drag="xbox_select")

p.line(x=proj.index, y=y, line_width=2, line_alpha=0.6)


p.xaxis.major_label_orientation = pi/4

# p.xaxis.formatter = DatetimeTickFormatter(seconds=["%Y:%M"],
#                                             minutes=["%Y:%M"],
#                                             minsec=["%Y:%M"],
#                                             hours=["%Y:%M"])

# axis map


# definitions

def select_rep():
    selected = proj[
        (proj.year >= min_year.value) &
        (proj.year >= max_year.value)
    ]
    return selected

def update():
    proj = select_rep()
    source.data = dict(
        year=proj["year"]   
    )



controls = [min_year, max_year]
for control in controls:
    control.on_change('value', lambda attr, old, new: update())


sizing_mode = 'fixed'  # 'scale_width' also looks nice with this example

## Build the html page and inline CSS
inputs = widgetbox(*controls)
l = layout([
    [desc],
    [p],
    [inputs],
], )

# update()

curdoc().add_root(l)
curdoc().title = "REP"

The SQLite output in Terminal.app looks like this: Terminal.app中的SQLite输出如下所示:

SQL 的SQL

The result is, that the x-axis displays in milliseconds. 结果是,x轴以毫秒为单位显示。 Also, the y-axis is showing up as exponential notation: 另外,y轴显示为指数符号:

Bokeh Plot 散景图

The issue seems somehow related to pandas use of indexing, and thus I can't reference "x" here. 这个问题似乎与熊猫使用索引有关,因此在这里我不能引用“ x”。 I rename the columns and force list sets which, by themselves, will print correctly... and should therefore plot into the line properly but as you'll see below, they don't: 我重命名了列和强制列表集,它们本身可以正确打印...因此应该正确地绘制到行中,但是正如您将在下面看到的那样,它们不能:

proj.rename(columns={'projdate': 'x', 'projvol': 'y'}, inplace=True)

x=list(proj['x'])
y=list(proj['y'])

To get the line to render in Bokeh, I have to pass it the index because passing it anything else doesn't seem to get the glyph to render. 要使线条在Bokeh中呈现,我必须将其传递给索引,因为传递其他任何东西似乎都无法使该字形呈现。 So currently I have this: 所以目前我有这个:

p = figure(title="REP Forecast", plot_width=900, plot_height=300, tools=TOOLS, x_axis_label='date', x_axis_type='datetime', y_axis_label='volume', active_drag="xbox_select")

p.line(x=proj.index, y=y, line_width=2, line_alpha=0.6)

Tried converting to unixepoch in the SQL, same result. 尝试在SQL中转换为unixepoch,结果相同。 Tried converting to unixepoch in the data, same result. 尝试将数据转换为unixepoch,结果相同。 Tried using DateTimeTickFormatter, just shows all 5-6 years as one year (thinking it's just displaying the milliseconds as years rather than changing them from milliseconds to days. 使用DateTimeTickFormatter进行了尝试,只是将所有5-6年都显示为一年(认为它只是将毫秒显示为年,而不是将它们从毫秒更改为天。

I've looked here and in github, up and down, and tried different things but ultimately I can't find one working example where the source is a sql query not a csv. 我曾在这里和github中上下浏览过,并尝试了不同的方法,但最终我找不到一个工作示例,其中源是sql查询而不是csv。

None of these things have anything to do with SQL, Bokeh only cares about the data that you give it, not where it came from. 这些都与SQL无关,Bokeh只关心您提供给它的数据,而不关心它来自何处。 You have specified that you want a datetime axis on the x-axis: 您已指定要在x轴上使用日期时间轴:

x_axis_type='datetime'

So, Bokeh will set up the plot with a ticker that picks "nice" values on a datetime scale, and with a tick formatter that displays tick locations as formatted dates. 因此,Bokeh将使用一个在日期时间刻度上选择“ nice”值的股票行情指示器以及一个将价格变动位置显示为格式化日期的股票行情格式化程序来设置图表。 What is important, however, is that the data coordinates are in the appropriate units, which are floating point milliseconds since epoch . 但是,重要的是数据坐标以适当的单位表示,该单位是自epoch以来的浮点毫秒数

You can provide x values directly in these units, but Bokeh will also automatically convert common datetime types (eg python stdlib, numpy, or pandas) to the right units automatically. 您可以直接以这些单位提供x值,但是Bokeh还将自动将常见的日期时间类型(例如python stdlib,numpy或pandas)自动转换为正确的单位。 So the easiest thing for you to do is pass a column of datetime values as the x values to line . 因此,最简单的方法是将一列datetime值作为x值传递给line

To be clear, this statement: 要明确的是,此语句:

To render the line in Bokeh, it has to use the index 要在散景中渲染线,必须使用索引

is incorrect. 是不正确的。 You can pass any dataframe column you like as the x-values, and I am suggesting you pass a column of datetimes. 您可以将任何喜欢的数据框列作为x值传递,我建议您传递一列日期时间。

I changed a line of the SQL to: 我将SQL的一行更改为:

CASE WHEN df is null THEN strftime('%Y',ds) ELSE strftime('%Y',df) END AS 'projdate',

However, when I try expanding that specifier to %Y-%m-%d %H-%m-%s it just reads it as a string all over again. 但是,当我尝试将该说明符扩展为%Y-%m-%d%H-%m-%s时,它将再次重新读取为字符串。

And also by re-importing the data I was able to pass the date through here without using Index: 而且通过重新导入数据,我能够在不使用索引的情况下通过此处传递日期:

p.line(x=x, y=y, line_width=2, line_alpha=0.6)

But then I get this weird output: link . 但是然后我得到了这个奇怪的输出: link

So it's clear that it can read the year, but I need to pass through the full date to display the time series forecast. 因此,很明显它可以读取年份,但是我需要经过整个日期才能显示时间序列预测。 And it's still displaying the dates and y-values in the incorrect scale, regardless. 无论如何,它仍然以错误的比例显示日期和y值。

Going to noodle on this some more but if anyone has other suggestions, I'm thankful. 再说些面条,但是如果有人有其他建议,我会很感激。

SOLVED the datetime problem. 解决了日期时间问题。 Added this after the SQL query: 在SQL查询之后添加了以下内容:

proj['projdate'] = proj['projdate'].astype('datetime64[ns]')

Which in turn yields this: 依次产生以下结果:

Bokeh Plot 散景图

Still got a problem with the x-axis but since that's a straight numerical value, x_axis_type should fix it. x轴仍然存在问题,但是由于这是一个直线数值,因此x_axis_type应该可以解决该问题。

So far the working code looks like this (again, still iterating to add other controls but everything about the Bokeh plot itself works as intended): 到目前为止,工作代码如下所示(再次,仍然在迭代中添加其他控件,但有关散景图本身的所有操作均按预期工作):

# main.py

# created by:           <>
# version:              0.1.2
# created date:         07-Aug-2018
# modified date:        09-Aug-2018

from os.path import dirname, join

import pandas as pd
import pandas.io.sql as psql
import numpy as np
import sqlite3
import os

from math import pi

from bokeh.plotting import figure, output_file, show 
from bokeh.io import output_notebook, curdoc
from bokeh.models import ColumnDataSource, Div, DatetimeTickFormatter
from bokeh.models.widgets import Slider, Select, RadioButtonGroup
from bokeh.layouts import layout, widgetbox

import warnings
import datetime

warnings.filterwarnings('ignore')


## Set up the SQL Connection

conn = sqlite3.connect('/Users/<>/Documents/python_scripts/reptool/reptool_db')
c = conn.cursor()

## Run the SQL

proj = pd.read_sql(
                       """

                        SELECT  

                        CASE WHEN df is null THEN strftime('%Y-%m-%d',ds) ELSE strftime('%Y-%m-%d',df) END AS 'projdate',
                        CASE WHEN yhat is null THEN y ELSE yhat END AS 'projvol',
                        strftime('%Y',ds) as 'year'

                        FROM forecast 

                        LEFT JOIN actuals 
                        ON forecast.ds = actuals.df 

                       """, con=conn)

proj['projdate'] = proj['projdate'].astype('datetime64[ns]')


# HTML index page and inline CSS stylesheet

desc = Div(text=open("/Users/<>/Documents/python_scripts/reptool/description.html").read(), width=800)


## Rename Columns and create list sets

proj.rename(columns={'projdate': 'x', 'projvol': 'y'}, inplace=True)

x=list(proj['x'])
y=list(proj['y'])


# Create input controls


radio_button_group = RadioButtonGroup(
        labels=["Actuals", "Forecast","FY Projection"], active=0)

min_year = Slider(title="Period Start", start=2012, end=2018, value=2013, step=1)
max_year = Slider(title="Period End", start=2012, end=2018, value=2017, step=1)

## Declare systemic source

source = ColumnDataSource(data=dict(x=[], y=[], year=[]))


## Bokeh tools

TOOLS="pan,wheel_zoom,box_zoom,reset,xbox_select"

## Set up plot


p = figure(title="REP Forecast", plot_width=900, plot_height=300, tools=TOOLS, x_axis_label='date', x_axis_type='datetime', y_axis_label='volume', active_drag="xbox_select")

p.line(x=x, y=y, line_width=2, line_alpha=0.6)


p.xaxis.major_label_orientation = pi/4

# p.xaxis.formatter = DatetimeTickFormatter(seconds=["%Y:%M"],
#                                             minutes=["%Y:%M"],
#                                             minsec=["%Y:%M"],
#                                             hours=["%Y:%M"])

# axis map


# definitions

def select_rep():
    selected = proj[
        (proj.year >= min_year.value) &
        (proj.year >= max_year.value)
    ]
    return selected

def update():
    proj = select_rep()
    source.data = dict(
        year=proj["year"]   
    )



controls = [min_year, max_year]
for control in controls:
    control.on_change('value', lambda attr, old, new: update())


sizing_mode = 'fixed'  # 'scale_width' also looks nice with this example

## Build the html page and inline CSS
inputs = widgetbox(*controls)
l = layout([
    [desc],
    [p],
    [inputs],
], )

# update()

curdoc().add_root(l)
curdoc().title = "REP"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM