简体   繁体   English

使用Python读写固定格式(MODFLOW)文本文件

[英]Read and write fixed format (MODFLOW) text files with Python

I am trying to read, manipulate and write text files using python. 我正在尝试使用python读取,操作和写入文本文件。 These files contain numeric matrices and were generated from a FORTRAN groundwater flow code called MODFLOW, and have an unusual shape, because the matrix rows are split across several file lines so that there are no more than 7 values per line. 这些文件包含数字矩阵,它们是从称为MODFLOW的FORTRAN地下水流代码生成的,并且具有不寻常的形状,因为矩阵行被分成几行,因此每行最多不超过7个值。 So a matrix row with 37 columns is output as 5 lines of 7 values (fmt='%14.6E') followed by 1 line with 2 values. 因此,具有37列的矩阵行输出为5行7值(fmt ='%14.6E'),然后输出1行2值。 The next matrix row then starts on a new line. 然后,下一个矩阵行从新行开始。

I am trying to read two such files, each with 730 time steps x 49 rows x 37 columns (about 18 Mb). 我正在尝试读取两个这样的文件,每个文件具有730个时间步x 49行x 37列(大约18 Mb)。 Then I want to multiply the data together elementwise and write the results into a new file with the same format. 然后,我想将数据按元素相乘,然后将结果写入具有相同格式的新文件中。

I can do it line by line using csv.reader and then numpy.savetext but it is extremely slow. 我可以使用csv.reader然后numpy.savetext逐行执行此操作,但这非常慢。 How can I do it with numpy (or similar) that will be faster? 我如何用更快的numpy(或类似程序)来做呢? Thanks! 谢谢!

UPDATE: 更新:

I'm almost there, just need to get rid of the commas in my output file. 我快到了,只需要删除输出文件中的逗号即可。 Apparently this isn't currently possible with pandas, so I might have to do it aa separate operation. 显然,大熊猫目前无法做到这一点,因此我可能必须单独进行操作。

SOLVED: 解决了:

Obtain the pandas output as text and use replace() to get rid of the delimiters. 以文本形式获取熊猫输出,并使用replace()摆脱定界符。 Still fast. 还是快。

import pandas as pd

root = 'Taupo'

rctrans = read_csv(root+'._rctrans', header=None, delim_whitespace=True)
rcmult = read_csv(root+'._rcmult', header=None, delim_whitespace=True)

# duplicate rcmult nsteps times to make it the same size as rctrans
nsteps = len(rctrans.index)/len(rcmult.index)    
rcmult = pd.concat([rcmult]*nsteps, ignore_index=True)

# multiply the arrays
rctrans = pd.DataFrame(rctrans.values*rcmult.values, columns=rctrans.columns, index=rctrans.index)

# write as csv with no delimiter
with open(root+'._rc','w') as w:
    w.write(rctrans.to_csv(header=False, index=False, float_format='%14.6E').replace(',',''))

I think any Python based file reader that handles the files line by line is going to have similar speed. 我认为任何逐行处理文件的基于Python的文件读取器都将具有类似的速度。 Pandas supposedly has a faster CSV reader, but I'm not familiar with it. 熊猫据说拥有更快的CSV阅读器,但我对此并不熟悉。 Do you have any sense of where your code is slow? 您对代码运行缓慢有任何感觉吗? reading the files? 阅读文件? parsing? 解析? collecting values in a list/array? 收集列表/数组中的值?

For a start I'd try to write a reader that takes in 6 lines, splices them together to get the 37 numbers in one line. 首先,我会尝试编写一个读入6行的阅读器,将它们拼接在一起以在一行中获得37个数字。 Then parse that and convert to a list of 37 floats. 然后解析并转换为37个浮点数的列表。 Finally append it to a master list. 最后将其附加到主列表。

Once I'm done 49 of those, create a (49,37) array, and save it or append it to another list that will hold all the time steps. 完成其中的49个之后,创建一个(49,37)数组,然后将其保存或追加到包含所有时间步长的另一个列表中。

As noted in other SO questions about np.genfromtxt or np.loadtxt , they accept any iterator (or generator). 如其他关于np.genfromtxtnp.loadtxt问题np.loadtxt ,它们接受任何迭代器(或生成器)。 So the input to the function could be this aggregator that turns 6 lines in to one line with 37 columns. 因此,函数的输入可能是该聚合器,它将6行转换为37列的一行。

Without knowing more details about your current method, I can't say whether my suggestion is any faster. 在不了解有关您当前方法的更多细节的情况下,我无法说出我的建议是否更快。 And without a similar test file, I really can't test alternatives. 没有类似的测试文件,我真的无法测试替代方案。 So at one level or other this is all speculative. 因此,从某种程度上讲,这都是推测性的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM