简体   繁体   English

Enthought Python和MatPlotLib中的乱码.csv时间序列数据

[英]Garbled .csv time-series data in Enthought Python and MatPlotLib

EDIT: 编辑:

Thank you for your prompt reply, Jonathan. 谢谢你的快速回复,乔纳森。

As you suggest below, I tried using numpy.loadtxt . 如下所示,我尝试使用numpy.loadtxt Unfortunately, a similar error appears. 不幸的是,出现了类似的错误。 The output of data = numpy.loadtxt("MyData.csv", skiprows = 39, delimiter = ",") is data = numpy.loadtxt("MyData.csv", skiprows = 39, delimiter = ",")

Traceback (most recent call last):
  File "/Users/aleksnavratil/Desktop/sandbox.py", line 23, in <module>
    data = numpy.loadtxt("MyData.csv", skiprows = 39, delimiter = ",")
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/numpy/lib/npyio.py", line 805, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: could not convert string to float: ﾿ÒᆳóネÀëÐٟᄀB.AØME84ハモ

The same error is thrown for arbitrary skiprows kwargs. 任意skiprows kwargs都会抛出相同的错误。 Perhaps this lends credence to the character encoding problem hypothesis. 也许这为字符编码问题假设提供了依据。 I'm still at a loss for a solution. 我仍然对解决方案感到茫然。

/EDIT /编辑

I have a .csv datafile produced by a scientific instrument (a CETR Universal Media Tester UMT-2). 我有一个科学仪器(CETR Universal Media Tester UMT-2)生成的.csv数据文件 The data represents a time series of measurements. 数据代表测量的时间序列。 The file behaves strangely when I access it from Python, but is well behaved when accessed via cat, Nano, TextEdit, etc. This phenomenon persists across Windows 7 and Snow Leopard machines, though both are using the Enthought Scientific Python distribution. 当我从Python访问它时,该文件表现得很奇怪,但是当通过cat,Nano,TextEdit等访问时表现良好。这种现象在Windows 7和Snow Leopard机器上仍然存在,尽管两者都使用了Enthought Scientific Python发行版。

The output of 的输出

f = codecs.open("MyData.csv",encoding="ascii")
data = f.xreadlines()
for line in data:
    print line

is

****
?****************************************
?****************************************

ÿÿÿZ
Ðí0


þÿÿî
üÿÿð
éí0
óÿÿí
ôí0

etc…. 等等…。

This smells like an encoding problem, so I investigated a bit: 这有点像编码问题,所以我调查了一下:

The output of file -i "MyData.csv" is file -i "MyData.csv"的输出是

MyData.csv: text/plain; charset=us-ascii

Using the CharDet module; 使用CharDet模块; the output of chardetect.py "MyData.csv" is chardetect.py "MyData.csv"的输出是

MyData.csv: ascii with confidence 1.0

Using the Codecs package, I tried several common encodings to no avail. 使用Codecs包,我尝试了几种常见的编码无济于事。 Also, I tried using Matplotlib's csv2rec . 另外,我尝试使用Matplotlib's csv2rec The output of 的输出

r = mlab.csv2rec(codecs.open("MyData.csv", 'rU',),skiprows=39, delimiter=",")

is

Traceback (most recent call last):
  File "/Volumes/AVN2109/Raw Data/CETR_Plotter.py", line 40, in <module>
    r = mlab.csv2rec(codecs.open("MyData.csv", 'rU',),skiprows=39, delimiter=",")
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-        packages/matplotlib/mlab.py", line 2181, in csv2rec
    process_skiprows(reader)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-    packages/matplotlib/mlab.py", line 2176, in process_skiprows
    for i, row in enumerate(reader):
Error: line contains NULL byte

This is also true for arbitrary skiprows kwargs. 对于任意跳蚤kwargs也是如此。

Furthermore, the instrument has the option to produce .txt's (as well as .csv's) as its output. 此外,该仪器可以选择生成.txt(以及.csv)作为其输出。 The behavior is identical in both cases. 两种情况下的行为都是相同的。 Perhaps I'm missing something obvious. 也许我错过了一些明显的东西。 Does anyone know how to persuade this data to play nice with Python? 有谁知道如何说服这些数据与Python玩得很好?

The package that you need to use to load your data is numpy and the function that I recommend you start with is loadtxt . 您需要用来加载数据的软件包是numpy ,我建议您开始使用的函数是loadtxt It is the simplest and works great in your case since your data is homogeneous: 它是最简单的,在您的情况下非常有用,因为您的数据是同质的:

import numpy
data = numpy.loadtxt("MyData.csv", skiprows = 39, delimiter = ",")

Of course your file is a little more complex with seemingly 2 arrays and some stuff at the top you may or may not want to throw away. 当然你的文件有点复杂,看似有2个数组,顶部的一些东西你可能想也可能不想扔掉。 You can refine this, loading the metadata first and then the numerical values. 您可以对此进行优化,首先加载元数据,然后加载数值。 You may also be interested in retaining the headers for the columns to create a "structured array" also called a "record array". 您可能还有兴趣保留列的标题以创建“结构化数组”,也称为“记录数组”。 To do that I recommend you build the dtype with a first pass by opening the file and detecting the names of the columns and their unit, and passing this dtype to loadtxt afterwards like above. 为此,我建议您通过打开文件并检测列的名称及其单位来构建第一遍的dtype,然后将此dtype传递给loadtxt,如上所述。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM