[英]Read and save data file with variable number of columns in python
I have a space separated data file that looks like this (just a slice) 我有一个用空格分隔的数据文件,看起来像这样(只是一个切片)
Wavelength Ele Excit loggf D0
11140.324 108.0 3.44 -7.945 4.395
11140.357 26.1 12.09 -2.247
11140.361 108.0 2.39 -8.119 4.395
11140.365 25.0 5.85 -9.734
11140.388 23.0 4.56 -4.573
11140.424 608.0 5.12 -10.419 11.09
11140.452 606.0 2.12 -11.054 6.25
11140.496 108.0 2.39 -8.119 4.395
11140.509 606.0 1.70 -7.824 6.25
First I would like to read the file a lá np.loadtxt
. 首先,我想读取文件
np.loadtxt
。 This does not work, so I tried with 这不起作用,所以我尝试了
d = np.genfromtxt('file.dat', skiprows=1, filling_value=0.0, missing_values=' ')
and different versions of that. 以及不同的版本。 All gave errors:
Line #3 (got 4 columns instead of 5)
. 所有人都给出了错误:
Line #3 (got 4 columns instead of 5)
。 I think I'm close to be able to read the file. 我想我已经可以读取该文件了。 Note, that I prefer a solution with something like
np.genfromtxt
rather than openning the file and go through it line by line: 请注意,我更喜欢使用
np.genfromtxt
类的解决方案,而不是打开文件并逐行进行处理:
with open('test.dat', 'r') as lines:
for line in lines:
# put numbers in arrays/lists
After reading the file succesfully, I need to save it in a specific format. 成功读取文件后,我需要将其保存为特定格式。 Very briefly, this file will be a input for a Fotran program, with 10 spaces per column for numbers.
简而言之,该文件将是Fotran程序的输入,每列数字有10个空格。 Without the last column (
D0
), I can use (there is a column I don't use, hence the '%27.1f'
) 没有最后一列(
D0
),我可以使用(有一个我不使用的列,因此是'%27.1f'
)
fmt_ = ('%9.2f', '%7.1f', '%11.2f','%10.3f', '%27.1f')
np.savetxt('output.dat', data, fmt=fmt_)
But I suspect this wouldn't work either. 但是我怀疑这也不行。 So s
np.genfromtxt
for saving could be helpful. 因此,用于保存的s
np.genfromtxt
可能会有所帮助。
Help for it all, one part or just some guidance are appreciated. 对此提供帮助,请务必提供一部分或仅提供一些指导。
Part 1: 第1部分:
Use pandas. 使用大熊猫。 It is designed specifically to handle this sort of scenario:
它是专门为处理这种情况而设计的:
import pandas as pd
df = pd.read_csv('test.csv', sep='\s+')
print(df)
gives you: 给你:
Wavelength Ele Excit loggf D0
0 11140.324 108.0 3.44 -7.945 4.395
1 11140.357 26.1 12.09 -2.247 NaN
2 11140.361 108.0 2.39 -8.119 4.395
3 11140.365 25.0 5.85 -9.734 NaN
4 11140.388 23.0 4.56 -4.573 NaN
5 11140.424 608.0 5.12 -10.419 11.090
6 11140.452 606.0 2.12 -11.054 6.250
7 11140.496 108.0 2.39 -8.119 4.395
8 11140.509 606.0 1.70 -7.824 6.250
Part 2 第2部分
You can also use pandas for this, although it is a bit more complex to get the formatting correct: 您也可以为此使用pandas,尽管要正确设置格式要复杂一些:
formatters = ['{: >9.2f}'.format, '{: >7.1f}'.format,
'{: >11.2f}'.format,'{: >10.3f}'.format,
lambda x: ' '*27 if np.isnan(x) else '{: >27.1f}'.format(x)]
lines = df.to_string(index=False, header=False, formatters=formatters)
with open('out.dat', 'w') as outfile:
outfile.write(lines)
Gives you: 给你:
11140.32 108.0 3.44 -7.945 4.4
11140.36 26.1 12.09 -2.247
11140.36 108.0 2.39 -8.119 4.4
11140.36 25.0 5.85 -9.734
11140.39 23.0 4.56 -4.573
11140.42 608.0 5.12 -10.419 11.1
11140.45 606.0 2.12 -11.054 6.2
11140.50 108.0 2.39 -8.119 4.4
11140.51 606.0 1.70 -7.824 6.2
Here's part of a sample run with your data. 这是与数据一起运行的示例的一部分。
In [62]: txt=b"""Wavelength Ele Excit loggf D0
11140.324 108.0 3.44 -7.945 4.395
...
11140.509 606.0 1.70 -7.824 6.25 """
In [63]: txt=txt.splitlines()
In [64]: def foo(astr):
# add a 'NaN' field to the short lines
if len(astr)<35:
astr += b' NaN' # or filler of your choice
return astr
....:
In [65]: data=np.loadtxt([foo(t) for t in txt], skiprows=1)
In [66]: data
Out[66]:
array([[ 1.11403240e+04, 1.08000000e+02, 3.44000000e+00,
-7.94500000e+00, 4.39500000e+00],
[ 1.11403570e+04, 2.61000000e+01, 1.20900000e+01,
-2.24700000e+00, nan],
...
[ 1.11405090e+04, 6.06000000e+02, 1.70000000e+00,
-7.82400000e+00, 6.25000000e+00]])
In [67]: np.savetxt('test.dat',x,fmt=fmt_)
In [69]: cat test.dat
11140.32 108.0 3.44 -7.945 4.4
11140.36 26.1 12.09 -2.247 nan
11140.36 108.0 2.39 -8.119 4.4
11140.36 25.0 5.85 -9.734 nan
...
11140.51 606.0 1.70 -7.824 6.2
The file can be passed through foo
like this: 该文件可以这样通过
foo
传递:
with open('test.dat') as f:
xx = np.loadtxt((foo(t) for t in f),skiprows=1)
savetxt
essentially does a row by row write
, so it isn't hard to write your own version. savetxt
本质上是逐行write
,因此编写自己的版本并不难。 eg 例如
In [120]: asbytes=np.lib.npyio.asbytes
In [121]: fmt__='%9.2f %7.1f %11.2f %10.3f %10.1f'
In [122]: with open('test.dat','wb') as f:
for row in x:
f.write(asbytes(fmt__%tuple(row)+'\n'))
.....:
In [123]: cat test.dat
11140.32 108.0 3.44 -7.945 4.4
11140.36 26.1 12.09 -2.247 nan
11140.36 108.0 2.39 -8.119 4.4
11140.36 25.0 5.85 -9.734 nan
...
11140.51 606.0 1.70 -7.824 6.2
With this it wouldn't be hard to test each row, and use a different format for rows with a nan
. 有了它,测试每一行并为具有
nan
行使用不同的格式就不难了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.