[英]Reading in .dat files with the word “upper” included causes problems
This is the code I use to read in .dat files: 这是我用来读取.dat文件的代码:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.dat', sep='\s+')
plt.figure()
plt.plot(df['yh_center[2]'], df['tot_scale01[4]'], 'ro')
plt.xlabel('yh_center')
plt.ylabel('tot_scale01[4]')
plt.savefig('name.pdf')
plt.show()
It reads in correctly when there is no lonely word "upper" in the dat. 当数据中没有孤独的单词“ upper”时,它会正确读取。 file.
文件。 However I will have many .dat files that have the word "upper" involved that cause errors.
但是,我将有许多.dat文件涉及“ upper”一词,这些文件会导致错误。
The .dat file looks like: .dat文件如下所示:
#labels: yh_lower[1] yh_center[2] yh_upper[3] tot_scale01[4]
#neval: 200000
#overflow:lower center upper 0.0000000000E+000 0.0000000000E+000 0.0000000000E+000 0.0000000000E+000
then on line 4 the actual data comes in 然后在第4行输入实际数据
-4.4000000000E+000 -4.3000000000E+000 -4.2000000000E+000 0.0000000000E+000
Then i have 43 lines more of the data and the final line is 然后我还有43行数据,最后一行是
#nx: 3
(where actually there are over 100 columns of data, but that should change...the principle should be shown with the first 4 columns) (实际上有100列以上的数据,但是应该改变...原理应在前4列中显示)
The full error report is 完整的错误报告是
Traceback (most recent call last):
File "<ipython-input-1-00512d6cb966>", line 1, in <module>
runfile('Private, so deleted that one here')
File "/usr/lib/python3.4/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "/usr/lib/python3.4/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "/home/ttp/manbrunn/Documents/NNLOJETexe/HJtest/funktioniert.py", line 15, in <module>
plt.plot(df['yh_center[2]'], df['tot_scale01[4]'], 'ro')
File "/usr/lib64/python3.4/site-packages/matplotlib/pyplot.py", line 3099, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib64/python3.4/site-packages/matplotlib/axes/_axes.py", line 1374, in plot
self.add_line(line)
File "/usr/lib64/python3.4/site-packages/matplotlib/axes/_base.py", line 1504, in add_line
self._update_line_limits(line)
File "/usr/lib64/python3.4/site-packages/matplotlib/axes/_base.py", line 1515, in _update_line_limits
path = line.get_path()
File "/usr/lib64/python3.4/site-packages/matplotlib/lines.py", line 874, in get_path
self.recache()
File "/usr/lib64/python3.4/site-packages/matplotlib/lines.py", line 575, in recache
x = np.asarray(xconv, np.float_)
File "/usr/lib64/python3.4/site-packages/numpy/core/numeric.py", line 462, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'upper'
You could pre-parse the data by removing any leading text from each line before passing it to pandas, for example: 您可以通过以下方式预先准备数据:在将每一行的前导文本传递给熊猫之前,将其删除,例如:
import pandas as pd
import re
with open('data.dat') as f_input:
header = next(f_input).split()[1:]
next(f_input)
next(f_input)
data = [line.strip().split() for line in f_input]
df = pd.DataFrame(data[:-1], dtype='float', columns=header)
This version is able to also extract the column names from the first row. 此版本还可以从第一行提取列名称。
If the upper
only appears on the third line (and not randomly throughout the file), you could simply use skiprows
to tell Pandas to start reading from the forth line onwards, and as you have a #nx: 3
in the final line, use skipfooter
to skip that too: 如果
upper
仅出现在第三行(而不是随机出现在整个文件中),则可以简单地使用skiprows
行告诉Pandas从skiprows
行开始读取,并且在最后一行有#nx: 3
时,请使用skipfooter
可以跳过:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.dat', sep='\s+', skiprows=3, skipfooter=1, header=None)
Tested on Pandas 0.22.0 在Pandas 0.22.0上测试
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.