如何从python文件中读取特定数量的浮点数？

Question

I am reading a text file from the web. 我正在从网上阅读文本文件。 The file starts with some header lines containing the number of data points, followed the actual vertices (3 coordinates each). 该文件从包含数据点数量的一些标题行开始，然后是实际的顶点（每个顶点3个坐标）。 The file looks like: 该文件如下所示：

# comment
HEADER TEXT
POINTS 6 float
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
POLYGONS

the line starting with the word POINTS contains the number of vertices (in this case we have 3 vertices per line, but that could change) 以POINTS单词开头的行包含顶点数量（在这种情况下，每行有3个顶点，但是可能会改变）

This is how I am reading it right now: 这就是我现在正在阅读的方式：

ur=urlopen("http://.../file.dat")

j=0
contents = []
while 1:
    line = ur.readline()
    if not line:
        break
    else:
        line=line.lower()       

    if 'points' in line :
        myline=line.strip()
        word=myline.split()
        node_number=int(word[1])
        node_type=word[2]

        while 'polygons'  not in line :
            line = ur.readline()
            line=line.lower() 
            myline=line.split()

            i=0
            while(i<len(myline)):                    
                contents[j]=float(myline[i])
                i=i+1
                j=j+1

How can I read a specified number of floats instead of reading line by line as strings and converting to floating numbers? 如何读取指定数量的浮点数，而不是逐行读取字符串并转换为浮点数？

Instead of ur.readline() I want to read the specified number of elements in the file 我想读取文件中指定数量的元素，而不是ur.readline（）

Any suggestion is welcome.. 任何建议都欢迎。

Answer 1

I'm not entirely sure what your goal is from your explanation. 我不确定您的解释是您的目标。

For the record, here is code that does basically the same thing as yours seems to be trying to that uses some techniques I would employ over the ones you have chosen. 作为记录，这里的代码与您似乎在尝试使用与我选择的技术相比将要采用的一些技术基本上具有相同的功能。 It's usually a sign that you're doing something wrong if you're using while loops and indices and indeed your code does not work because contents[j] = ... will be an IndexError . 如果使用while循环和索引，通常表明您做错了事情，并且您的代码实际上不起作用，因为contents[j] = ...将是IndexError 。

lines = (line.strip().lower() for line in your_web_page)

points_line = next(line for line in lines if 'points' in line)
_, node_number, node_type = points_line.split()
node_number = int(node_number)

def get_contents(lines):
    for line in lines:
        if 'polygons' in line:
            break

        for number in line.split():
            yield float(number)

contents = list(get_contents(lines))

If you are more explicit about the new thing it is you want to do, maybe someone can provide a better answer for your ultimate goal. 如果您对自己想做的新事物更加明确，也许有人可以为您的最终目标提供更好的答案。

Answer 2

Here is a no-fuss cleanup of your code that should make the looping over the contents much faster. 这是对代码的轻松清理，应可使内容循环更快。

ur=urlopen("http://.../file.dat")
contents = []
node_number = 0
node_type = None
while 1:
    line = ur.readline()
    if not line:
        break
    line = line.lower()       
    if 'points' in line :
        word = line.split()
        node_number = int(word[1])
        node_type = word[2]
        while 1:
            pieces = ur.readline().split()
            if not pieces: continue # or break or issue error message
            if pieces[0].lower() == 'polygons': break
            contents.extend(map(float, pieces))
assert len(contents) == node_number * 3

If you wrap the code in a function and call that, it will run even faster (because you will be accessing local variables instead of global ones). 如果将代码包装在一个函数中并调用它，它将运行得更快（因为您将访问局部变量而不是全局变量）。

Note that the most significant changes are near/at the end of the script. 请注意，最重要的更改在脚本的结尾/结尾处。

HOWEVER: stand back and think about this for a few seconds: how much of the time is taken up by the ur.readline() and how much by unpacking the lines? 但是：退后一步，想一想这几秒钟：ur.readline（）会占用多少时间，而拆开包装线会占用多少时间呢？

如何从python文件中读取特定数量的浮点数？

问题描述

2 个解决方案

解决方案1
3 2010-04-20 23:40:06

解决方案2
0 2010-04-21 00:04:22

如何从python文件中读取特定数量的浮点数？

问题描述

2 个解决方案

解决方案1 3 2010-04-20 23:40:06

解决方案2 0 2010-04-21 00:04:22

解决方案1
3 2010-04-20 23:40:06

解决方案2
0 2010-04-21 00:04:22