简体   繁体   English

Python为什么在此代码中两次使用open(filename)?

[英]Python Why use open(filename) twice in this code?

Here's a piece of code from Machine Learning in Action Chap2. 这是《机器学习在行动》第二章中的一段代码。 The goal is to transfer a file to matix. 目标是将文件传输到matix。 What I dont understand is why should I use fr=open(filename) twice? 我不明白的是为什么我应该两次使用fr = open(filename)?

When I delete the second open(filename), the code just return blank matrix. 当我删除第二个open(filename)时,代码仅返回空白矩阵。 And I cant figure it why. 我不知道为什么。

Thanks a lot for taking time! 非常感谢您抽出宝贵的时间!

def file2matrix(filename):
    fr = open(filename)
    numberOfLines = len(fr.readlines())        
    returnMat = zeros((numberOfLines,3))       
    classLabelVector = []                       
    fr = open(filename)
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

It reads the file twice: 它读取文件两次:

  1. Firstly it reads all lines, then counts the lines and initializes the matrix: 首先,它读取所有行,然后对行进行计数并初始化矩阵:

     fr = open(filename) numberOfLines = len(fr.readlines()) returnMat = zeros((numberOfLines,3)) 
  2. Secondly it reads the file again to fill the matrix: 其次,它再次读取文件以填充矩阵:

     fr = open(filename) index = 0 for line in fr.readlines(): line = line.strip() ... 

And it needs to open the file again, to start reading from its beginning again. 并且它需要再次打开文件,以从头开始重新读取。

It's not an effective code. 这不是有效的代码。 Since fr.readlines() reads the whole file, there's no need to read the file again, instead the result (list of lines) should be stored in a variable and reused when filling the matrix. 由于fr.readlines()读取整个文件,因此无需再次读取文件,而是将结果(行列表)存储在变量中,并在填充矩阵时重用。

Also close() should be called when finished dealing with the file. 处理完文件后,还应该调用close()

When you use the readlines function, it reads all the lines into memory and by the end of it the file pointer is at the very end of the file. 当您使用readlines函数时,它将所有行读入内存,并且在文件末尾,文件指针位于文件的末尾。

So if you try to readlines again after having used it already, since the file pointer is at the end it will read from the end to the end, hence the blank matrix. 因此,如果您已经使用完后尝试重新读取行,由于文件指针位于末尾,它将从末尾到末尾读取,因此为空白矩阵。

They reopened the file so that the file pointer is back at the beginning. 他们重新打开了文件,使文件指针又回到了开头。 Another way of doing that is filevariable.seek(0) that will move the file pointer back to the start and you should be able to use readlines again. 这样做的另一种方法是filevariable.seek(0),它将文件指针移回起点,您应该能够再次使用读取行。

One thing to note is that readlines reads the whole file into memory, if you have a massive file you should use a for loop and use readline to read one line at a time. 需要注意的一件事是,readlines会将整个文件读入内存,如果您有一个庞大的文件,则应使用for循环并使用readline一次读取一行。

It is now recommended to always use context managers when working with files. 现在建议在处理文件时始终使用上下文管理器。 Try this below, it should be pretty close to what you are looking for. 请尝试以下操作,它应该与您要寻找的非常接近。

def file2matrix(filename):
    with open(filename, "r") as fr:
        returnMat = zeros((len(fr.readlines,3))
        classLabelVector = [] 
        index = 0
        for line in fr:
            line = line.strip()
            listFromLine = line.split('\t')
            returnMat[index,:] = listFromLine[0:3]
            classLabelVector.append(int(listFromLine[-1]))
            index += 1
    return returnMat,classLabelVector

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM