在Python中从文件读取数据块

Question

I'm new to python and am trying to read "blocks" of data from a file. 我是python的新手，正在尝试从文件中读取数据的“块”。 The file is written something like: 该文件写为：

# Some comment
# 4 cols of data --x,vx,vy,vz
# nsp, nskip =           2          10


#            0   0.0000000


#            1           4
 0.5056E+03  0.8687E-03 -0.1202E-02  0.4652E-02
 0.3776E+03  0.8687E-03  0.1975E-04  0.9741E-03
 0.2496E+03  0.8687E-03  0.7894E-04  0.8334E-03
 0.1216E+03  0.8687E-03  0.1439E-03  0.6816E-03


#            2           4
 0.5056E+03  0.8687E-03 -0.1202E-02  0.4652E-02
 0.3776E+03  0.8687E-03  0.1975E-04  0.9741E-03
 0.2496E+03  0.8687E-03  0.7894E-04  0.8334E-03
 0.1216E+03  0.8687E-03  0.1439E-03  0.6816E-03


#          500  0.99999422


#            1           4
 0.5057E+03  0.7392E-03 -0.6891E-03  0.4700E-02
 0.3777E+03  0.9129E-03  0.2653E-04  0.9641E-03
 0.2497E+03  0.9131E-03  0.7970E-04  0.8173E-03
 0.1217E+03  0.9131E-03  0.1378E-03  0.6586E-03

and so on

Now I want to be able specify and read only one block of data out of these many blocks. 现在，我希望能够指定和读取许多块中的仅一个数据块。 I'm using numpy.loadtxt('filename',comments='#') to read the data but it loads the whole file in one go. 我正在使用numpy.loadtxt('filename',comments='#')读取数据，但它一次加载了整个文件。 I searched online and someone has created a patch for the numpy io routine to specify reading blocks but it's not in mainstream numpy. 我在网上搜索，有人为numpy io例程创建了一个补丁，以指定读取块，但它不是主流numpy。

It's much easier to choose blocks of data in gnuplot but I'd have to write the routine to plot the distribution functions. 在gnuplot中选择数据块要容易得多，但我必须编写例程来绘制分布函数。 If I can figure out reading specific blocks, it would be much easier in python. 如果我能弄清楚读取特定的块，那么在python中会容易得多。 Also, I'm moving all my visualization codes to python from IDL and gnuplot, so it'll be nice to have everything in python instead of having things scattered around in multiple packages. 另外，我还将所有可视化代码从IDL和gnuplot移至python，因此最好将所有内容都包含在python中，而不是将内容分散在多个包中。

I thought about calling gnuplot from within python, plotting a block to a table and assigning the output to some array in python. 我考虑过要从python中调用gnuplot，将一个块绘制到一个表中，然后将输出分配给python中的某个数组。 But I'm still starting and I could not figure out the syntax to do it. 但是我仍在开始，我无法弄清楚该语法。

Any ideas, pointers to solve this problem would be of great help. 任何想法，解决这个问题的指针都会有很大帮助。

Answer 1

A quick basic read: 快速基础阅读：

>>> def read_blocks(input_file, i, j):
    empty_lines = 0
    blocks = []
    for line in open(input_file):
        # Check for empty/commented lines
        if not line or line.startswith('#'):
            # If 1st one: new block
            if empty_lines == 0:
                blocks.append([])
            empty_lines += 1
        # Non empty line: add line in current(last) block
        else:
            empty_lines = 0
            blocks[-1].append(line)
    return blocks[i:j + 1]

>>> for block in read_blocks(s, 1, 2):
    print '-> block'
    for line in block:
        print line


-> block
 0.5056E+03  0.8687E-03 -0.1202E-02  0.4652E-02
 0.3776E+03  0.8687E-03  0.1975E-04  0.9741E-03
 0.2496E+03  0.8687E-03  0.7894E-04  0.8334E-03
 0.1216E+03  0.8687E-03  0.1439E-03  0.6816E-03
-> block
 0.5057E+03  0.7392E-03 -0.6891E-03  0.4700E-02
 0.3777E+03  0.9129E-03  0.2653E-04  0.9641E-03
 0.2497E+03  0.9131E-03  0.7970E-04  0.8173E-03
 0.1217E+03  0.9131E-03  0.1378E-03  0.6586E-03
>>>

Now I guess you can use numpy to read the lines... 现在我想你可以使用numpy读取行了...

Answer 2

The following code should probably get you started. 以下代码可能会帮助您入门。 You will probably need the re module. 您可能需要re模块。

You can open the file for reading using: 您可以使用以下方法打开文件以进行读取：

f = open("file_name_here")

You can read the file one line at a time by using 您可以使用来一次读取一行文件

line = f.readline()

To jump to the next line that starts with a "#", you can use: 要跳至以“＃”开头的下一行，可以使用：

while not line.startswith("#"):
    line = f.readline()

To parse a line that looks like "# ij", you could use the following regular expression: 要解析看起来像“＃ij”的行，可以使用以下正则表达式：

is_match = re.match("#\s+(\d+)\s+(\d+)",line)
if is_match:
    i = is_match.group(1)
    j = is_match.group(2)

See the documentation for the "re" module for more information on this. 有关更多信息，请参见“ re”模块的文档。

To parse a block, you could use the following bit of code: 要解析一个块，可以使用以下代码：

block = [[]] # block[i][j] will contain element i,j in your block
while not line.isspace(): # read until next blank line
    block.append(map(float,line.split(" "))) 
    # splits each line at each space and turns all elements to float
    line = f.readline()

You can then turn your block into a numpy array if you want: 然后，您可以将块变成一个numpy数组：

block = np.array(block)

Provided you have imported numpy as np. 前提是您已将numpy导入为np。 If you want to read multiple blocks between i and j, just put the above code to read one block into a function and use it multiple times. 如果要读取i和j之间的多个块，只需将以上代码放入一个函数中读取一个块并多次使用即可。

Hope this helps! 希望这可以帮助！

在Python中从文件读取数据块

问题描述

2 个解决方案

解决方案1
5 已采纳 2012-05-09 17:23:57

解决方案2
1 2012-05-09 17:05:03

在Python中从文件读取数据块

问题描述

2 个解决方案

解决方案1 5 已采纳 2012-05-09 17:23:57

解决方案2 1 2012-05-09 17:05:03

解决方案1
5 已采纳 2012-05-09 17:23:57

解决方案2
1 2012-05-09 17:05:03