繁体   English   中英

在python2.7.11中,为什么不能删除fileopen代码?

[英]In python2.7.11, why can't I remove the fileopen code?

保存数据的.txt文件如下(来源: 此处第2章中的“ datingTestSet2.txt”):

40920   8.326976    0.953952    largeDoses
14488   7.153469    1.673904    smallDoses
26052   1.441871    0.805124    didntLike
75136   13.147394   0.428964    didntLike
38344   1.669788    0.134296    didntLike
...   

码:

from numpy import *
import operator
from os import listdir

def file2matrix(filename):
    fr = open(filename)
    # arr = fr.readlines() # Code1!!!!!!!!!!!!!!!!!!!
    numberOfLines = len(fr.readlines())        #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))       #prepare matrix to return   
    classLabelVector = []                      #prepare labels return   
    fr = open(filename)  # Code2!!!!!!!!!!!!!!!!!!!!!
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')

该功能的结果是:

      datingDataMat                 datingLabels
40920   8.326976    0.953952           3
14488   7.153469    1.673904           2
26052   1.441871    0.805124           1
75136   13.147394   0.428964           1
38344   1.669788    0.134296           1
72993   10.141740   1.032955           1
35948   6.830792    1.213192           3
42666   13.276369   0.543880           3
67497   8.631577    0.749278           1
35483   12.273169   1.508053           3
50242   3.723498    0.831917           1
...     ...         ...               ...

我的问题是:

  1. 当我只删除Code2( fr = open(filename) ,它在index = 0之上)时,该函数的结果变为全零矩阵和全零向量。 为什么我不能删除Code2? 第一行( fr = open(filename)不起作用吗?

  2. 当我只添加Code1( arr = fr.readlines() )时,这是错误的。 为什么???

     returnMat[index,:] = listFromLine[0:3] IndexError: index 0 is out of bounds for axis 0 with size 0 

1)由于此行,您无法删除Code2行:

numberOfLines = len(fr.readlines())        #get the number of lines in the file

在那一行中,您正在读取文件的末尾。 再次打开它将使您进入文件的开头。

2)与上面的答案类似,如果您调用readLines()来读取所有行并将文件光标移至文件末尾...因此,如果您随后尝试再次读取文件上的行,则有没有什么可以阅读的,因此失败了。

您位于文件末尾。 因此,您第二次尝试读取文件内容不会产生任何结果。 您需要返回到文件的开头。 采用:

fr.seek(0)

代替您的:

fr = open(filename)  # Code2!!!!!!!!!!!!!!!!!!!!!

你只需要readlines一次。

def file2matrix(filename):
    fr = open(filename)
    lines = fr.readlines()    
    fr.close()    
    numberOfLines = len(lines)        #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))       #prepare matrix to return   
    classLabelVector = []                      #prepare labels return   
    index = 0
    for line in lines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        # careful here, returnMat is initialed as floats
        # listFromLine is list of strings
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

我可以提出其他一些建议:

def file2matrix(filename):
    with open(filename) as f:
        lines = f.readlines()
    returnList = []
    classLabelList = []
    for line in lines:
        listFromLine = line.strip().split('\t')
        returnList.append(listFromLine[0:3])
        classLabelList.append(int(listFromLine[-1]))
    returnMat = np.array(returnList, dtype=float)
    return returnMat, classLabelList

甚至

def file2matrix(filename):
    with open(filename) as f:
        lines = f.readlines()
    ll = [line.strip().split('\t')]
    returnMat = np.array([l[0:3] for l in ll], dtype=float)
    classLabelList = [int(l[-1]) for l in ll]
    # classLabelVec = np.array([l[-1] for l in ll], dtype=int)
    return returnMat, classLabelList

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM