简体   繁体   中英

Heatmap that shows both high and low density regions clearly (python)

I have a data set I would like to represent as a heatmap (x,y positions). A couple of areas are of much higher density than the rest of the region. This has had the result of these high density regions completely washing out the detail of the lower density regions.

I think using a Gaussian KDE provides the best representation (and looks the nicest) compared to say 2d histograms or contour plots, so would prefer solutions using this method.

I can't post images because this account has less than 10 rep, but here are some examples of what I've tried .

My code snippets are based on already posted snippets that I link below rather than repost (some are rather lengthy), but I'll edit to include them if asked.

The first few are based on Ivo Bosticky's code in this question: Efficient method of calculating density of irregularly spaced points . The images there are the 'style' that I'm after. As shown in the album linked above, with a small gridsize the low density regions are hard to make out, no real detail. Higher gridsizes show some splotchier detail, but really not a smooth transition from the high density to low density. Putting values on a logscale washes out the whole thing on lower resolutions, and with higher resolutions shows detail but doesn't appropriately blend the grid.

The second couple in that album are based on the scipy.stats.gaussian_kde example. Changing the gridsize seems to have essentially no effect, and the logscale washes it all out again.

So the TLDR: How do I make a 2D Gaussian KDE that shows the detail smoothly in both high and low density regions?

The most naive way to represent scattered data is using scatter plots. Of course, the problem is that once a certain point density is reached, a scatter plot provides no further information. In that case, we use histograms or heatmaps based on some KDE. These methods however invariably remove detail in the less dense areas of our dataset.

My suggestion for showing both therefore would be to make a scatterplot colored by your kde values. Eg as

pyplot.scatter(your_x,your_y,c=your_kde_value,marker='.',linewidth=0)

Here, your_kde_value is an array containing the value of the KDE function at the points of your scatter plot (ie it should have the same shape as your_x and your_y .

Results might look like this (using a sample of 10000 points from a bivariate normal distribution:

在此输入图像描述

As you can see, the color information provides all the detail in the center, whereas we still retain the outlying points.

Here's an example that illustrates my suggestion - this is based on this matplotlib example:

import matplotlib.pyplot as plt
import numpy as np



# make these smaller to increase the resolution
dx, dy = 0.01, 0.01

# generate 2 2d grids for the x & y bounds
y, x = np.mgrid[slice(1, 5 + dy, dy),
                slice(1, 5 + dx, dx)]

z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

plt.contourf(x,y,z, 20, cmap = 'rainbow')    #change these levels
plt.contour(x,y,z, 5, colors = 'k', linewidths = .25) #and here

plt.show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM