Using matplotlib to label points on a scatter plot on mouse over with some label other than x,y coordinates

I am trying to use the DataCursor approach ( https://stackoverflow.com/a/4674445/1301710 ) to label points using matplotlib. I have several thousand points and would like to see their label on mouseover. However, there are two differences: one, I'm making a scatter plot and two, I want to label names for each point and not just the x,y coordinates.

here is my code

import os
import matplotlib.pyplot as plt

class DataCursor(object):
text_template = 'x: %0.2f\ny: %0.2f'
x, y = 0.0, 0.0
xoffset, yoffset = -20, 20
def __init__(self, ax, labels,x,y):
    self.ax = ax
self.xlist = x
self.ylist = y
self.labels = labels
    self.annotation = ax.annotate(self.text_template, 
            xy=(self.x, self.y), xytext=(self.xoffset, self.yoffset), 
            textcoords='offset points', ha='right', va='bottom',
            bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
            arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0')

def __call__(self, event):
    self.event = event
    xdata, ydata = event.artist.get_data()
    #self.x, self.y = xdata[event.ind], ydata[event.ind]
    self.x, self.y = event.mouseevent.xdata, event.mouseevent.ydata
self.label = self.labels[self.xlist.index(self.x)]
    if self.x is not None:
        self.annotation.xy = self.x, self.y

def process():
#code to make ht_dict here
# ht_dict has the following format: 'ht1' = [nov14count, nov21count] where each key is a string and each value is a list of two integers

print("Start making scatter plot..")
hts = []
nov14 = []
nov21 = []
for key in ht_dict.keys():
fig = plt.figure()
scatter = plt.scatter(nov14, nov21)

fig.canvas.mpl_connect('pick_event', DataCursor(plt.gca(), hts, nov14, nov21))


I am getting the following error though:

AttributeError: 'CircleCollection' object has no attribute 'get_data'

I want to be able to see the string stored in the list hts on mouseover at the x and y coordinates stored in nov14 and nov21 lists respectively at the same indices. I'm not sure what to make of this error and would appreciate any help. Another question I have is (from trying out changes on the existing plot in the DataCursor thread) that using index of to get back label as I am currently doing will give me a value doesn't exist in list error since the clicked on value may not be exactly same as the value in the list. Do you have any suggestions of a better way to display some label/name about a point?

Any guidance or pointers to documentation where I could read up would be appreciated.


Adopting the approach for annotating using the label, given in the examples section on the documentation page of the mpldatacursor , you could do something along these lines (plot a single point with each scatter plot to be able to set an individual label for each point):

import matplotlib.pyplot as plt
from mpldatacursor import datacursor
import random

fig, ax = plt.subplots()
ax.set_title('Click on a dot to display its label')

# Plot a number of random dots
for i in range(1, 1000):
    ax.scatter([random.random()], [random.random()], label='$ID: {}$'.format(i))

# Use a DataCursor to interactively display the label for a selected line...


Unfortunately it's rather inefficient, ie, hardly usable with more than, say, 1000 points.

Result example image:


