简体   繁体   English

读取包含单词“ upper”的.dat文件会导致问题

[英]Reading in .dat files with the word “upper” included causes problems

This is the code I use to read in .dat files: 这是我用来读取.dat文件的代码:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.dat', sep='\s+')

plt.figure()
plt.plot(df['yh_center[2]'], df['tot_scale01[4]'], 'ro')
plt.xlabel('yh_center')
plt.ylabel('tot_scale01[4]')
plt.savefig('name.pdf')
plt.show()

It reads in correctly when there is no lonely word "upper" in the dat. 当数据中没有孤独的单词“ upper”时,它会正确读取。 file. 文件。 However I will have many .dat files that have the word "upper" involved that cause errors. 但是,我将有许多.dat文件涉及“ upper”一词,这些文件会导致错误。

The .dat file looks like: .dat文件如下所示:

#labels: yh_lower[1]   yh_center[2]   yh_upper[3]   tot_scale01[4]
#neval: 200000
#overflow:lower center upper    0.0000000000E+000    0.0000000000E+000    0.0000000000E+000    0.0000000000E+000

then on line 4 the actual data comes in 然后在第4行输入实际数据

-4.4000000000E+000   -4.3000000000E+000   -4.2000000000E+000    0.0000000000E+000

Then i have 43 lines more of the data and the final line is 然后我还有43行数据,最后一行是

#nx: 3

(where actually there are over 100 columns of data, but that should change...the principle should be shown with the first 4 columns) (实际上有100列以上的数据,但是应该改变...原理应在前4列中显示)

The full error report is 完整的错误报告是

Traceback (most recent call last):

  File "<ipython-input-1-00512d6cb966>", line 1, in <module>
    runfile('Private, so deleted that one here')

  File "/usr/lib/python3.4/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
    execfile(filename, namespace)

  File "/usr/lib/python3.4/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 85, in execfile
    exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

  File "/home/ttp/manbrunn/Documents/NNLOJETexe/HJtest/funktioniert.py", line 15, in <module>
    plt.plot(df['yh_center[2]'], df['tot_scale01[4]'], 'ro')

  File "/usr/lib64/python3.4/site-packages/matplotlib/pyplot.py", line 3099, in plot
    ret = ax.plot(*args, **kwargs)

  File "/usr/lib64/python3.4/site-packages/matplotlib/axes/_axes.py", line 1374, in plot
    self.add_line(line)

  File "/usr/lib64/python3.4/site-packages/matplotlib/axes/_base.py", line 1504, in add_line
    self._update_line_limits(line)

  File "/usr/lib64/python3.4/site-packages/matplotlib/axes/_base.py", line 1515, in _update_line_limits
    path = line.get_path()

  File "/usr/lib64/python3.4/site-packages/matplotlib/lines.py", line 874, in get_path
    self.recache()

  File "/usr/lib64/python3.4/site-packages/matplotlib/lines.py", line 575, in recache
    x = np.asarray(xconv, np.float_)

  File "/usr/lib64/python3.4/site-packages/numpy/core/numeric.py", line 462, in asarray
    return array(a, dtype, copy=False, order=order)

ValueError: could not convert string to float: 'upper'

You could pre-parse the data by removing any leading text from each line before passing it to pandas, for example: 您可以通过以下方式预先准备数据:在将每一行的前导文本传递给熊猫之前,将其删除,例如:

import pandas as pd
import re

with open('data.dat') as f_input:
    header = next(f_input).split()[1:]
    next(f_input)
    next(f_input)

    data = [line.strip().split() for line in f_input]

df = pd.DataFrame(data[:-1], dtype='float', columns=header)

This version is able to also extract the column names from the first row. 此版本还可以从第一行提取列名称。


If the upper only appears on the third line (and not randomly throughout the file), you could simply use skiprows to tell Pandas to start reading from the forth line onwards, and as you have a #nx: 3 in the final line, use skipfooter to skip that too: 如果upper仅出现在第三行(而不是随机出现在整个文件中),则可以简单地使用skiprows行告诉Pandas从skiprows行开始读取,并且在最后一行有#nx: 3时,请使用skipfooter可以跳过:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.dat', sep='\s+', skiprows=3, skipfooter=1, header=None)

Tested on Pandas 0.22.0 在Pandas 0.22.0上测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM