[英]f.readline() doesn't capture the last line of the file
I am reading from a very large text file using readline().我正在使用 readline() 从一个非常大的文本文件中读取。 The file is several million lines in length.
该文件有几百万行长。 However, whatever I do doesn't capture the last line of the file.
但是,无论我做什么都不会捕获文件的最后一行。
The file I am reading looks like this:我正在阅读的文件如下所示:
$ tail file.txt
22 rs1193135566 0 50807787 C G 0 0 0 0 NA 0 0 0 NA NA 0
22 rs1349597430 0 50807793 T G 0 0 0 0 0 0 0 NA NA NA NA
22 rs1230501076 0 50807799 T G 0 0 NA NA 0 0 0 NA 0 NA 0
22 22_50807803 0 50807803 C G 0 0 0 0 0 0 0 0 0 NA 0
22 rs1488400844 0 50807810 G T 0 0 0 NA 0 0 0 0 0 NA 0
22 rs1279244475 0 50807811 G T 0 0 0 NA 0 0 0 0 0 NA 0
22 rs1346432135 0 50807812 G A 0 NA 0 0 0 0 0 0 0 NA 0
22 rs1340490361 0 50807813 C G 0 0 0 NA 0 0 0 0 0 NA 0
22 22_50807816 0 50807816 G T 0 0 0 NA 0 0 0 0 0 NA 0
22 rs1412997563 0 50807818 G C 0 0 0 NA 0 0 0 0 0 NA 0
And my code looks like this:我的代码如下所示:
with open('/path/file.txt', 'r') as f:
for l in f:
line = l.rstrip('\n').split("\t")
print(line)
The last line of the file comes out empty []
.文件的最后一行出现空
[]
。
The output looks like this:输出如下所示:
['22', 'rs1250150067', '0', '50807769', 'G', 'A', 'NA', '0', '0', '0', '0', '0', '0', '0', '0', 'NA', '0']
['22', 'rs1193135566', '0', '50807787', 'C', 'G', '0', '0', '0', '0', 'NA', '0', '0', '0', 'NA', 'NA', '0']
['22', 'rs1230501076', '0', '50807799', 'T', 'G', '0', '0', 'NA', 'NA', '0', '0', '0', 'NA', '0', 'NA', '0']
['22', 'rs1488400844', '0', '50807810', 'G', 'T', '0', '0', '0', 'NA', '0', '0', '0', '0', '0', 'NA', '0']
['22', 'rs1346432135', '0', '50807812', 'G', 'A', '0', 'NA', '0', '0', '0', '0', '0', '0', '0', 'NA', '0']
['22', '22_50807816', '0', '50807816', 'G', 'T', '0', '0', '0', 'NA', '0', '0', '0', '0', '0', 'NA', '0']
['']
You are reading only one line, try using f.readlines() instead, which will read all the lines.您只阅读一行,请尝试使用 f.readlines() 代替,它将读取所有行。 If you wish to use line by line then use subscripting.
如果您希望逐行使用,请使用下标。
lines = f.readlines()
print(lines[0]) # to display 1st line
print(lines[1]) # to display 2nd line
And so on.等等。 You can also print lines in loop, after reading, like
您还可以在阅读后循环打印行,例如
lines = f.readlines()
for line in lines:
print(line)
Edit 1: It appears in the output you have provided like your loop is not reading all lines, since only second, fourth, sixth lines from end are visible in output.编辑 1:它出现在您提供的输出中,就像您的循环没有读取所有行一样,因为在输出中只有从 end 开始的第二、第四、第六行可见。
Also try using strip() instead of rstrip('\\n') since this will strip any white space around your string on both sides.还可以尝试使用 strip() 而不是 rstrip('\\n') ,因为这会去除两边字符串周围的任何空白。
I think you are looking for something like this:我想你正在寻找这样的东西:
with open('/path/file.txt', 'r') as f:
for lines in f.readlines():
line = lines.rstrip('\n').split("\t")
print(line)
You are discarding every other line.您正在丢弃所有其他行。
for line in f
already reads a line into line
. for line in f
已经将一行读入line
。 You then discard that and fetch another line with line = f.readline()
.然后您丢弃它并使用
line = f.readline()
获取另一行。 My Python 3.5.1 actually warns and aborts:我的 Python 3.5.1 实际上警告并中止:
ValueError: Mixing iteration and read methods would lose data
You can read all the lines into memory at once, or process one at a time.您可以一次将所有行读入内存,也可以一次处理一行。 I generally recommend the latter unless your processing needs to have all the data in memory in the end (and even then you probably need to parse it into a sane structure, so keeping the raw data in memory is just wasteful).
我通常推荐后者,除非您的处理最终需要将所有数据保存在内存中(即便如此,您也可能需要将其解析为合理的结构,因此将原始数据保存在内存中只是浪费)。
with open('/path/file.txt', 'r') as f:
for line in f:
print(line.rstrip('\n').split('\t')) # or process line
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.