简体   繁体   English

如何从csv文件制作直方图,该文件在python中只包含一列数字?

[英]How do I make a histogram from a csv file which contains a single column of numbers in python?

I have a csv file (excel spreadsheet) of a column of roughly a million numbers. 我有一个大约一百万个数列的csv文件(excel电子表格)。 I want to make a histogram of this data with the frequency of the numbers on the y-axis and the number quantities on the x-axis. 我想用y轴上的数字频率和x轴上的数量来制作这些数据的直方图。 I know matplotlib can plot a histogram, but my main problem is converting the csv file from string to float since a string can't be graphed. 我知道matplotlib可以绘制直方图,但我的主要问题是将csv文件从字符串转换为float,因为字符串无法绘制。 This is what I have: 这就是我所拥有的:

import matplotlib.pyplot as plt
import csv

with open('D1.csv', 'rb') as data:
    rows = csv.reader(data, quoting = csv.QUOTE_NONNUMERIC) 
    floats = [[item for number, item in enumerate(row) if item and (1 <= number <= 12)] for row in rows]
plt.hist(floats, bins=50)
plt.title("histogram")
plt.xlabel("value")
plt.ylabel("frequency")
plt.show()

You can do it in one line with pandas : 您可以与pandas一起执行以下操作

import pandas as pd

pd.read_csv('D1.csv', quoting=2)['column_you_want'].hist(bins=50)

Okay I finally got something to work with headings, titles, etc. 好吧,我终于得到了与标题,标题等一起工作的东西。

import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('D1.csv', quoting=2)
data.hist(bins=50)
plt.xlim([0,115000])
plt.title("Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

My first problem was that matplotlib is necessary to actually show the graph. 我的第一个问题是,需要matplotlib才能真正显示图形。 Also, I needed to set the action 另外,我需要设置动作

pd.read_csv('D1.csv', quoting=2)

to data so I could plot the histogram of that action with 数据,所以我可以绘制该动作的直方图

data.hist

Thank you all for the help. 谢谢大家的帮助。

Panda's read_csv is very powerful, but if your csv file is simple (without headers, or NaNs or comments) you do not need Pandas, as you can use Numpy: Panda的read_csv功能非常强大,但如果您的csv文件很简单(没有标题,或NaN或注释),则不需要Pandas,因为您可以使用Numpy:

import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('D1.csv')
plt.hist(data, normed=True, bins='auto')

(In fact loadtxt can deal with some headers and comments, but read_csv is more versatile) (事实上​​, loadtxt可以处理一些标题和注释,但read_csv更通用)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM