简体   繁体   中英

How to update pyplot histogram

I have a 100.000.000 sample dataset and I want to make a histogram with pyplot. But reading this large file drains my memory critically (cursor not moving anymore, ...), so I'm looking for ways to 'help' pyplot.hist . I was thinking breaking up the file into several smaller files might help. But I wouldn't know how to combine them afterwards.

you can combine the output of pyplot.hist , or as @titusjan suggested numpy.histogram , as long as you keep your bins fixed each time you call it. For example:

import matplotlib.pyplot as plt
import numpy as np

# Generate some fake data
data=np.random.rand(1000)

# The fixed bins (change depending on your data)
bins=np.arange(0,1.1,0.1)

sub_hist = [], []
# Split into 10 sub histograms
for i in np.arange(0,1000,10):
    sub_hist_temp, bins_out = np.histogram(data[i:i+10],bins=bins)
    sub_hist.append(sub_hist_temp)

# Sum the histograms
hist_sum = np.array(sub_hist).sum(axis=0)

# Plot the new summed data, using plt.bar
fig=plt.figure()
ax1=fig.add_subplot(211)
ax1.bar(bins[:-1],hist_sum,width=0.1) # Change width depending on your bins

# Plot the histogram of all data to check
ax2=fig.add_subplot(212)
hist_all, bins_out, patches = all=ax2.hist(data,bins=bins)

fig.savefig('histsplit.png')

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM