简体   繁体   English

python bokeh,如何制作相关图?

[英]python bokeh, how to make a correlation plot?

How can I make a correlation heatmap in Bokeh?如何在散景中制作相关热图?

import pandas as pd
import bokeh.charts

df = pd.util.testing.makeTimeDataFrame(1000)
c = df.corr()

p = bokeh.charts.HeatMap(c) # not right

# try to make it a long form
# (and it's ugly in pandas to use 'index' in melt)

c['x'] = c.index
c = pd.melt(c, 'x', ['A','B','C','D'])

# this shows the right 4x4 matrix, but values are still wrong
p = bokeh.charts.HeatMap(c, x = 'x', y = 'variable', values = 'value') 

By the way, can I make a colorbar on the side, instead of legends in the plot?顺便说一句,我可以在侧面制作一个颜色条,而不是情节中的图例吗? And also how to choose the color range/mapping eg dark blue (-1) to white (0) to dark red (+1)?以及如何选择颜色范围/映射,例如深蓝色(-1)到白色(0)到深红色(+1)?

So I think I can provide a baseline code to help do what you are asking using a combination of the answers above and some extra pre-processing.所以我想我可以提供一个基线代码来帮助你使用上面的答案和一些额外的预处理的组合来完成你的要求。

Let's assume you have a dataframe df already loaded (in this case the UCI Adult Data ) and the correlation coefficients calculated ( p_corr ).假设您已经加载了一个数据帧df (在本例中为UCI Adult Data )并计算了相关系数( p_corr )。

import bisect
#
from math import pi
from numpy import arange
from itertools import chain
from collections import OrderedDict
#
from bokeh.palettes import RdBu as colors  # just make sure to import a palette that centers on white (-ish)
from bokeh.models import ColorBar, LinearColorMapper

colors = list(reversed(colors[9]))  # we want an odd number to ensure 0 correlation is a distinct color
labels = df.columns
nlabels = len(labels)

def get_bounds(n):
    """Gets bounds for quads with n features"""
    bottom = list(chain.from_iterable([[ii]*nlabels for ii in range(nlabels)]))
    top = list(chain.from_iterable([[ii+1]*nlabels for ii in range(nlabels)]))
    left = list(chain.from_iterable([list(range(nlabels)) for ii in range(nlabels)]))
    right = list(chain.from_iterable([list(range(1,nlabels+1)) for ii in range(nlabels)]))
    return top, bottom, left, right

def get_colors(corr_array, colors):
    """Aligns color values from palette with the correlation coefficient values"""
    ccorr = arange(-1, 1, 1/(len(colors)/2))
    color = []
    for value in corr_array:
        ind = bisect.bisect_left(ccorr, value)
        color.append(colors[ind-1])
    return color

p = figure(plot_width=600, plot_height=600,
           x_range=(0,nlabels), y_range=(0,nlabels),
           title="Correlation Coefficient Heatmap (lighter is worse)",
           toolbar_location=None, tools='')

p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.major_label_orientation = pi/4
p.yaxis.major_label_orientation = pi/4

top, bottom, left, right = get_bounds(nlabels)  # creates sqaures for plot
color_list = get_colors(p_corr.values.flatten(), colors)

p.quad(top=top, bottom=bottom, left=left,
       right=right, line_color='white',
       color=color_list)

# Set ticks with labels
ticks = [tick+0.5 for tick in list(range(nlabels))]
tick_dict = OrderedDict([[tick, labels[ii]] for ii, tick in enumerate(ticks)])
# Create the correct number of ticks for each axis 
p.xaxis.ticker = ticks
p.yaxis.ticker = ticks
# Override the labels 
p.xaxis.major_label_overrides = tick_dict
p.yaxis.major_label_overrides = tick_dict

# Setup color bar
mapper = LinearColorMapper(palette=colors, low=-1, high=1)
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')

show(p)

This will result in the following plot if the categories are integer encoded (this is a horrible data example):如果类别是整数编码的,这将导致以下图(这是一个可怕的数据示例):

散景中的 Pearson 相关系数热图

In modern Bokeh you should use thebokeh.plotting interface .在现代散景中,您应该使用bokeh.plotting界面 You can see an example of a categorical heatmap generated using this interface in the gallery:您可以在图库中看到使用此界面生成的分类热图示例:

http://docs.bokeh.org/en/latest/docs/gallery/categorical.html http://docs.bokeh.org/en/latest/docs/gallery/categorical.html


Regarding a legend, for a colormap like this you actually will want a discrete ColorBar instead of a Legend .关于图例,对于这样的颜色图,您实际上需要一个离散的ColorBar而不是Legend This is a new feature that will be present in the upcoming 0.12.2 release later this week (today's date: 2016-08-28) .这是一项新功能,将出现在本周晚些时候(今天的日期:2016-08-28)即将发布的0.12.2版本中。 These new colorbar annotations can be located outside the main plot area.这些新的颜色条注释可以位于主绘图区域之外。

There is also an example in the GitHub repo: GitHub repo 中还有一个示例:

https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/color_data_map.py https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/color_data_map.py

Note that last example also uses another new feature to do the colormapping in the browser, instead of having to precompute the colors in python.请注意,最后一个示例还使用另一个新功能在浏览器中进行颜色映射,而不必在 python 中预先计算颜色。 Basically all together it looks like:基本上所有在一起看起来像:

# create a color mapper with your palette - can be any list of colors
mapper = LinearColorMapper(palette=Viridis3, low=0, high=100)

p = figure(toolbar_location=None, tools='', title=title)
p.circle(
    x='x', y='y', source=source

    # use the mapper to colormap according to the 'z' column (in the browser)
    fill_color={'field': 'z', 'transform': mapper},  
)

# create a ColorBar and addit to the side of the plot
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')

There are more sophisticated options too, eg if you want to control the ticking on the colorbar more carefully you could add a custom ticker or tick formatter just like on a normal Axis , to achieve things like:还有更复杂的选项,例如,如果您想更仔细地控制颜色栏上的刻度,您可以像在普通Axis上一样添加自定义代码或刻度格式,以实现以下目的:

在此处输入图片说明

It's not clear what your actual requirements are, so I just mention this in case it is useful to know.目前尚不清楚您的实际需求是什么,所以我只是提到这一点,以防万一它是有用的。


Finally, Bokeh is a large project and finding the best way to do so often involves asking for more information and context, and in general, having a discussion.最后,Bokeh 是一个大型项目,找到最好的方法通常需要询问更多信息和背景,一般来说,进行讨论。 That kind of collaborative help seems to be frowned upon at SO, (they are "not real answers") so I'd encourage you to also check out the project Discourse for help anytime.这种协作帮助似乎对 SO 不屑一顾(它们“不是真正的答案”),因此我鼓励您也随时查看项目 Discourse以获得帮助。

I tried to create an interactive correlation plot using the Bokeh library.我尝试使用 Bokeh 库创建交互式相关图。 The code is the combination of different solutions available on SO and other websites.该代码是 SO 和其他网站上可用的不同解决方案的组合。 In above solution bigreddot has explained things in details.在上面的解决方案中 bigreddot 已经详细解释了事情。 The code for correlation heatmap as below:相关热图代码如下:

import pandas as pd
from bokeh.io import output_file, show
from bokeh.models import BasicTicker, ColorBar, LinearColorMapper, ColumnDataSource, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.transform import transform
from bokeh.palettes import Viridis3, Viridis256
# Read your data in pandas dataframe
data = pd.read_csv(%%%%%Your Path%%%%%)
#Now we will create correlation matrix using pandas
df = data.corr()

df.index.name = 'AllColumns1'
df.columns.name = 'AllColumns2'

# Prepare data.frame in the right format
df = df.stack().rename("value").reset_index()

# here the plot :
output_file("CorrelationPlot.html")

# You can use your own palette here
# colors = ['#d7191c', '#fdae61', '#ffffbf', '#a6d96a', '#1a9641']

# I am using 'Viridis256' to map colors with value, change it with 'colors' if you need some specific colors
mapper = LinearColorMapper(
    palette=Viridis256, low=df.value.min(), high=df.value.max())

# Define a figure and tools
TOOLS = "box_select,lasso_select,pan,wheel_zoom,box_zoom,reset,help"
p = figure(
    tools=TOOLS,
    plot_width=1200,
    plot_height=1000,
    title="Correlation plot",
    x_range=list(df.AllColumns1.drop_duplicates()),
    y_range=list(df.AllColumns2.drop_duplicates()),
    toolbar_location="right",
    x_axis_location="below")

# Create rectangle for heatmap
p.rect(
    x="AllColumns1",
    y="AllColumns2",
    width=1,
    height=1,
    source=ColumnDataSource(df),
    line_color=None,
    fill_color=transform('value', mapper))

# Add legend
color_bar = ColorBar(
    color_mapper=mapper,
    location=(0, 0),
    ticker=BasicTicker(desired_num_ticks=10))

p.add_layout(color_bar, 'right')

show(p)

References:参考资料:

[1] https://docs.bokeh.org/en/latest/docs/user_guide.html [1] https://docs.bokeh.org/en/latest/docs/user_guide.html

[2] Bokeh heatmap from Pandas confusion matrix [2] 来自 Pandas 混淆矩阵的散景热图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM