简体   繁体   中英

Python scatter plot - how to see number of entries per point

I have data points that repeat themselves for quite a lot of entries (creating many overlapping (x,y) points) and I'm really interested in knowing the number of entries for each point on the graph. Is there an easy way of doing this (besides the obvious of writing a piece of code that does this?)

First of all get your points into a list of tuples:

L = [(1,2), (0,0), (0,0), (1,2), (1,2), (3,4)]

I'm assuming you're reading your data in from a file or something, and not hard-coding it like I did above, so modify your import routine to give you a list of tuples, or post-process your imported data to form your list of tuples.

Why am I going on about tuples? Because they are hashable, and therefore can be used to make a set:

S = set(L)
print (S)
set([(1, 2), (0, 0), (3, 4)])

Now we have all the unique points in the data, YAY! ... but how many times is each repeated? That can be done by counting them in the list... Or being too lazy to do that get the list to count its-self using the lists count method:

F = {}
for i in list(S):
    F[i] = L.count(i)

Now F is a dictionary containing a frequency table for each of our (X,Y) points F.keys() will give you all of the unique locations, and the dictionary contains how many times each point happened. Now all we need to do is plot it:

from matplotlib.pyplot import figure, show
fig = figure()
sub = fig.add_subplot(111)

Due to the fact that we are using weird lists of tuples, we'll need to use some list comprehensions to get the data back into a format that plot likes:

K = F.keys()
Xs = [i[0] for i in K]
Ys = [i[1] for i in K]

Now it will plot nicely:

sub.plot(Xs, Ys, 'bo')

and the plot can be annotated with our frequencies like so:

for i in K:
    sub.annotate(F[i], xy=i)

Show the plot:

show()

And you will get something like this: 结果图

I'd recommend a Bubble Chart if the points aren't too close together. The number of overlapping points would be represented by the size of the bubble.

You can do this in a spreadsheet (using Excel) or in Javascript (using Google Charts) .

You could use multiple axis.
Check these Gallery examples for inspiration.

From Matplotlib 1.4.2 documentation:

References:

multiple_yaxis_with_spines.py

Matplotlib Examples

Source code:

  1. plot_bmh.py
  2. two_scales.py
  3. multiple_yaxis_with_spines.py

Examples:

plot_bmh.png two_scales.png multiple_yaxis_with_spines.png

You can also use colours to represent the number of occurrences of each scatter point. Note, that my code builds on the answer of @Mark.

import numpy as np
import matplotlib as ml
import matplotlib.pyplot as plt

# generate some points
points = np.random.rand(2,200).round(decimals=1)*10

# count occurences
L = [tuple(ii) for ii in points.T]
S = set(L)
F = {}
for i in list(S):
    F[i] = L.count(i)

# make counts-array of same length as points and attribute colours
C = np.array([F[(x, y)] for x,y in zip(points[0], points[1])])
norm = ml.colors.Normalize(0, vmax=C.max())
colours = [ml.cm.ScalarMappable(norm=norm, cmap='Reds').to_rgba(c) for c in C]

# plot figure
fig, ax = plt.subplots(1,1)
p = ax.scatter(points[0], points[1], c=colours, edgecolor='k')

This is how the result may look like

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM