简体   繁体   中英

Cumulative histogram for 2D data in Python

My data consists of a 2-D array of masses and distances. I want to produce a plot where the x-axis is distance and the y axis is the number of data elements with distance <= x (ie a cumulative histogram plot). What is the most efficient way to do this with Python?

PS: the masses are irrelevant since I already have filtered by mass, so all I am trying to produce is a plot using the distance data.

Example plot below:

例子

You can combine numpy.cumsum() and plt.step() :

import matplotlib.pyplot as plt
import numpy as np

N = 15
distances = np.random.uniform(1, 4, 15).cumsum()
counts = np.random.uniform(0.5, 3, 15)
plt.step(distances, counts.cumsum())
plt.show()

示例图

Alternatively, plt.bar can be used to draw a histogram, with the widths defined by the difference between successive distances. Optionally, an extra distance needs to be appended to give the last bar a width.

plt.bar(distances, counts.cumsum(), width=np.diff(distances, append=distances[-1]+1), align='edge')
plt.autoscale(enable=True, axis='x', tight=True)  # make x-axis tight

条形图 Instead of appending a value, eg a zero could be prepended, depending on the exact interpretation of the data.

plt.bar(distances, counts.cumsum(), width=-np.diff(distances, prepend=0), align='edge')

This is what I figured I can do given a 1D array of data:

plt.figure()
counts = np.ones(len(data))
plt.step(np.sort(data), counts.cumsum())
plt.show()

This apparently works with duplicate elements also, as the ys will be added for each x.

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM