简体   繁体   English

f.readline() 不捕获文件的最后一行

[英]f.readline() doesn't capture the last line of the file

I am reading from a very large text file using readline().我正在使用 readline() 从一个非常大的文本文件中读取。 The file is several million lines in length.该文件有几百万行长。 However, whatever I do doesn't capture the last line of the file.但是,无论我做什么都不会捕获文件的最后一行。

The file I am reading looks like this:我正在阅读的文件如下所示:

$ tail file.txt
22  rs1193135566    0   50807787    C   G   0   0   0   0   NA  0   0   0   NA  NA  0
22  rs1349597430    0   50807793    T   G   0   0   0   0   0   0   0   NA  NA  NA  NA
22  rs1230501076    0   50807799    T   G   0   0   NA  NA  0   0   0   NA  0   NA  0
22  22_50807803 0   50807803    C   G   0   0   0   0   0   0   0   0   0   NA  0
22  rs1488400844    0   50807810    G   T   0   0   0   NA  0   0   0   0   0   NA  0
22  rs1279244475    0   50807811    G   T   0   0   0   NA  0   0   0   0   0   NA  0
22  rs1346432135    0   50807812    G   A   0   NA  0   0   0   0   0   0   0   NA  0
22  rs1340490361    0   50807813    C   G   0   0   0   NA  0   0   0   0   0   NA  0
22  22_50807816 0   50807816    G   T   0   0   0   NA  0   0   0   0   0   NA  0
22  rs1412997563    0   50807818    G   C   0   0   0   NA  0   0   0   0   0   NA  0

And my code looks like this:我的代码如下所示:

with open('/path/file.txt', 'r') as f:

  for l in f:
      line = l.rstrip('\n').split("\t")
      print(line)

The last line of the file comes out empty [] .文件的最后一行出现空[]

The output looks like this:输出如下所示:

['22', 'rs1250150067', '0', '50807769', 'G', 'A', 'NA', '0', '0', '0', '0', '0', '0', '0', '0', 'NA', '0']
['22', 'rs1193135566', '0', '50807787', 'C', 'G', '0', '0', '0', '0', 'NA', '0', '0', '0', 'NA', 'NA', '0']
['22', 'rs1230501076', '0', '50807799', 'T', 'G', '0', '0', 'NA', 'NA', '0', '0', '0', 'NA', '0', 'NA', '0']
['22', 'rs1488400844', '0', '50807810', 'G', 'T', '0', '0', '0', 'NA', '0', '0', '0', '0', '0', 'NA', '0']
['22', 'rs1346432135', '0', '50807812', 'G', 'A', '0', 'NA', '0', '0', '0', '0', '0', '0', '0', 'NA', '0']
['22', '22_50807816', '0', '50807816', 'G', 'T', '0', '0', '0', 'NA', '0', '0', '0', '0', '0', 'NA', '0']
['']

You are reading only one line, try using f.readlines() instead, which will read all the lines.您只阅读一行,请尝试使用 f.readlines() 代替,它将读取所有行。 If you wish to use line by line then use subscripting.如果您希望逐行使用,请使用下标。

lines = f.readlines()
print(lines[0]) # to display 1st line
print(lines[1]) # to display 2nd line

And so on.等等。 You can also print lines in loop, after reading, like您还可以在阅读后循环打印行,例如

lines = f.readlines()
for line in lines:
    print(line)

Edit 1: It appears in the output you have provided like your loop is not reading all lines, since only second, fourth, sixth lines from end are visible in output.编辑 1:它出现在您提供的输出中,就像您的循环没有读取所有行一样,因为在输出中只有从 end 开始的第二、第四、第六行可见。

Also try using strip() instead of rstrip('\\n') since this will strip any white space around your string on both sides.还可以尝试使用 strip() 而不是 rstrip('\\n') ,因为这会去除两边字符串周围的任何空白。

I think you are looking for something like this:我想你正在寻找这样的东西:

    with open('/path/file.txt', 'r') as f:
        for lines in f.readlines():
            line = lines.rstrip('\n').split("\t")
            print(line)

You are discarding every other line.您正在丢弃所有其他行。

for line in f already reads a line into line . for line in f已经将一行读入line You then discard that and fetch another line with line = f.readline() .然后您丢弃它并使用line = f.readline()获取另一行。 My Python 3.5.1 actually warns and aborts:我的 Python 3.5.1 实际上警告并中止:

ValueError: Mixing iteration and read methods would lose data

You can read all the lines into memory at once, or process one at a time.您可以一次将所有行读入内存,也可以一次处理一行。 I generally recommend the latter unless your processing needs to have all the data in memory in the end (and even then you probably need to parse it into a sane structure, so keeping the raw data in memory is just wasteful).我通常推荐后者,除非您的处理最终需要将所有数据保存在内存中(即便如此,您也可能需要将其解析为合理的结构,因此将原始数据保存在内存中只是浪费)。

with open('/path/file.txt', 'r') as f:
    for line in f:
        print(line.rstrip('\n').split('\t'))   # or process line

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM