简体   繁体   中英

python bokeh, how to make a correlation plot?

How can I make a correlation heatmap in Bokeh?

import pandas as pd
import bokeh.charts

df = pd.util.testing.makeTimeDataFrame(1000)
c = df.corr()

p = bokeh.charts.HeatMap(c) # not right

# try to make it a long form
# (and it's ugly in pandas to use 'index' in melt)

c['x'] = c.index
c = pd.melt(c, 'x', ['A','B','C','D'])

# this shows the right 4x4 matrix, but values are still wrong
p = bokeh.charts.HeatMap(c, x = 'x', y = 'variable', values = 'value') 

By the way, can I make a colorbar on the side, instead of legends in the plot? And also how to choose the color range/mapping eg dark blue (-1) to white (0) to dark red (+1)?

So I think I can provide a baseline code to help do what you are asking using a combination of the answers above and some extra pre-processing.

Let's assume you have a dataframe df already loaded (in this case the UCI Adult Data ) and the correlation coefficients calculated ( p_corr ).

import bisect
#
from math import pi
from numpy import arange
from itertools import chain
from collections import OrderedDict
#
from bokeh.palettes import RdBu as colors  # just make sure to import a palette that centers on white (-ish)
from bokeh.models import ColorBar, LinearColorMapper

colors = list(reversed(colors[9]))  # we want an odd number to ensure 0 correlation is a distinct color
labels = df.columns
nlabels = len(labels)

def get_bounds(n):
    """Gets bounds for quads with n features"""
    bottom = list(chain.from_iterable([[ii]*nlabels for ii in range(nlabels)]))
    top = list(chain.from_iterable([[ii+1]*nlabels for ii in range(nlabels)]))
    left = list(chain.from_iterable([list(range(nlabels)) for ii in range(nlabels)]))
    right = list(chain.from_iterable([list(range(1,nlabels+1)) for ii in range(nlabels)]))
    return top, bottom, left, right

def get_colors(corr_array, colors):
    """Aligns color values from palette with the correlation coefficient values"""
    ccorr = arange(-1, 1, 1/(len(colors)/2))
    color = []
    for value in corr_array:
        ind = bisect.bisect_left(ccorr, value)
        color.append(colors[ind-1])
    return color

p = figure(plot_width=600, plot_height=600,
           x_range=(0,nlabels), y_range=(0,nlabels),
           title="Correlation Coefficient Heatmap (lighter is worse)",
           toolbar_location=None, tools='')

p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.major_label_orientation = pi/4
p.yaxis.major_label_orientation = pi/4

top, bottom, left, right = get_bounds(nlabels)  # creates sqaures for plot
color_list = get_colors(p_corr.values.flatten(), colors)

p.quad(top=top, bottom=bottom, left=left,
       right=right, line_color='white',
       color=color_list)

# Set ticks with labels
ticks = [tick+0.5 for tick in list(range(nlabels))]
tick_dict = OrderedDict([[tick, labels[ii]] for ii, tick in enumerate(ticks)])
# Create the correct number of ticks for each axis 
p.xaxis.ticker = ticks
p.yaxis.ticker = ticks
# Override the labels 
p.xaxis.major_label_overrides = tick_dict
p.yaxis.major_label_overrides = tick_dict

# Setup color bar
mapper = LinearColorMapper(palette=colors, low=-1, high=1)
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')

show(p)

This will result in the following plot if the categories are integer encoded (this is a horrible data example):

散景中的 Pearson 相关系数热图

In modern Bokeh you should use thebokeh.plotting interface . You can see an example of a categorical heatmap generated using this interface in the gallery:

http://docs.bokeh.org/en/latest/docs/gallery/categorical.html


Regarding a legend, for a colormap like this you actually will want a discrete ColorBar instead of a Legend . This is a new feature that will be present in the upcoming 0.12.2 release later this week (today's date: 2016-08-28) . These new colorbar annotations can be located outside the main plot area.

There is also an example in the GitHub repo:

https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/color_data_map.py

Note that last example also uses another new feature to do the colormapping in the browser, instead of having to precompute the colors in python. Basically all together it looks like:

# create a color mapper with your palette - can be any list of colors
mapper = LinearColorMapper(palette=Viridis3, low=0, high=100)

p = figure(toolbar_location=None, tools='', title=title)
p.circle(
    x='x', y='y', source=source

    # use the mapper to colormap according to the 'z' column (in the browser)
    fill_color={'field': 'z', 'transform': mapper},  
)

# create a ColorBar and addit to the side of the plot
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')

There are more sophisticated options too, eg if you want to control the ticking on the colorbar more carefully you could add a custom ticker or tick formatter just like on a normal Axis , to achieve things like:

在此处输入图片说明

It's not clear what your actual requirements are, so I just mention this in case it is useful to know.


Finally, Bokeh is a large project and finding the best way to do so often involves asking for more information and context, and in general, having a discussion. That kind of collaborative help seems to be frowned upon at SO, (they are "not real answers") so I'd encourage you to also check out the project Discourse for help anytime.

I tried to create an interactive correlation plot using the Bokeh library. The code is the combination of different solutions available on SO and other websites. In above solution bigreddot has explained things in details. The code for correlation heatmap as below:

import pandas as pd
from bokeh.io import output_file, show
from bokeh.models import BasicTicker, ColorBar, LinearColorMapper, ColumnDataSource, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.transform import transform
from bokeh.palettes import Viridis3, Viridis256
# Read your data in pandas dataframe
data = pd.read_csv(%%%%%Your Path%%%%%)
#Now we will create correlation matrix using pandas
df = data.corr()

df.index.name = 'AllColumns1'
df.columns.name = 'AllColumns2'

# Prepare data.frame in the right format
df = df.stack().rename("value").reset_index()

# here the plot :
output_file("CorrelationPlot.html")

# You can use your own palette here
# colors = ['#d7191c', '#fdae61', '#ffffbf', '#a6d96a', '#1a9641']

# I am using 'Viridis256' to map colors with value, change it with 'colors' if you need some specific colors
mapper = LinearColorMapper(
    palette=Viridis256, low=df.value.min(), high=df.value.max())

# Define a figure and tools
TOOLS = "box_select,lasso_select,pan,wheel_zoom,box_zoom,reset,help"
p = figure(
    tools=TOOLS,
    plot_width=1200,
    plot_height=1000,
    title="Correlation plot",
    x_range=list(df.AllColumns1.drop_duplicates()),
    y_range=list(df.AllColumns2.drop_duplicates()),
    toolbar_location="right",
    x_axis_location="below")

# Create rectangle for heatmap
p.rect(
    x="AllColumns1",
    y="AllColumns2",
    width=1,
    height=1,
    source=ColumnDataSource(df),
    line_color=None,
    fill_color=transform('value', mapper))

# Add legend
color_bar = ColorBar(
    color_mapper=mapper,
    location=(0, 0),
    ticker=BasicTicker(desired_num_ticks=10))

p.add_layout(color_bar, 'right')

show(p)

References:

[1] https://docs.bokeh.org/en/latest/docs/user_guide.html

[2] Bokeh heatmap from Pandas confusion matrix

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM