简体   繁体   中英

Exporting Histogram from Python to Excel

I'm pretty new to coding and I need to some help with exporting data or just printing it on the python shell. The code is:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import openpyxl

data = pd.read_excel('/Users/user/Desktop/Data/Book1.xlsx')
df = data.hist(bins=40)
plt.xlim([0,1000])
plt.title('Data')
plt.xlabel('Neuron')
plt.ylabel('# of Spikes')
plt.show()

So the code makes a histogram after binning the data into 40 bins, the range of the 0 to 1558.5 or so. What I'm trying to do is export the data AFTER binning since when I try writing:

writer = pd.ExcelWriter('/Users/user/Desktop/Data/output.xlsx')
df1.to_excel(writer,'Sheet2')
writer.save()

it saves the original data and not the data that has the histogram applied and has the bins applied. Also, if I can get some help in how to change the number of bins in a range from 0 to 5, 5 to 10, etc. basically it reads in intervals of 5, all the way towards the end of the data, so it'll eventually stop at the last bit of data and stick that data into a bin. Any help is appreciated and it doesn't have to be specifically pandas. Thank you. By the way I think what I made was a Dataframe from the imported data, again just a beginner so not so sure.

The line df = data.hist(bins=40) does not actually create a DataFrame of binned data. df ends up holding a numpy ndarray that contains a matplotlib.axes._subplots.AxesSubplot object.

One way to save the binned data is to create the histogram via matplotlib's hist() . Add the following lines directly after your read_excel line:

import matplotlib.pyplot as plt
counts, bins, bars = plt.hist(data.values, bins=40)
df = pd.DataFrame({'bin_leftedge': bins[:-1], 'count': counts})

Then, as pointed out in a comment, be sure to change df1.to_excel(writer,'Sheet2') to df.to_excel(writer,'Sheet2') .

bins contains the edges of each bin, so the bins array will have one more element than the counts array. Keep in mind that the above code associates each count with the left edge of that count's bin, and does not save the rightmost bin edge.

There may be a better or pandas-idiomatic way to do this, but hopefully the above meets your needs.


EDIT: integer bin widths

You can pass a list of bin edges as bins= to either data.hist() or plt.hist() . To create bins of width 5 that start at 0 and include the maximum value of the data, this should work:

counts, bins, patches = plt.hist(data.values, bins=range(0, max(data.values)+5, 5))

Explanation: Python's built-in range(start, stop, step) accepts only integers, and returns a list that includes the left endpoint ( start ) but excludes the right endpoint ( stop ). (In math notation, range(start, stop, step) returns evenly spaced integers on the half-open interval [start, stop) .) The +5 in the above line ensures that the last bin's right edge ends up on the righthand side of the maximum data value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM