Python-如何讀取文本文件中的特定行？

Question

我有一個巨大的文本文件（12GB）。 這些行用制表符分隔，第一列包含一個ID。 對於每個ID，我都想做些事情。 因此，我的計划是從第一行開始，逐行遍歷第一列，直到到達下一個ID。

start_line = b
num_lines = 377763316

while b < num_lines:
  plasmid1 = linecache.getline("Result.txt", b-1)
  plasmid1 = plasmid1.strip("\n")
  plasmid1 = plasmid1.split("\t")

  plasmid2 = linecache.getline("Result.txt", b)
  plasmid2 = plasmid2.strip("\n")
  plasmid2 = plasmid2.split("\t")


    if not str(plasmid1[0]) == str(plasmid2[0]):
      end_line = b
      #do something

該代碼可以工作，但是問題是線路緩存似乎每次都重新加載txt文件。 如果不提高性能，該代碼將運行幾年。

如果您有個好主意如何解決問題或知道替代方法，我們將不勝感激！

謝謝菲利普

Answer 1

您應該只打開文件一次，然后遍歷各行。

with open('Result.txt', 'r') as f:
    aline = f.next()
    currentid = aline.split('\t', 1)[0]
    for nextline in f:
        nextid = nextline.split('\t', 1)[0]
        if nextid != currentid:
            #do stuff
            currentid = nextid

您有主意，只需使用普通python。 每次迭代僅讀取一行。 拆分中多余的1參數將僅拆分到第一個選項卡，從而提高了性能。 使用任何專用庫都不會獲得更好的性能。 只有普通的C語言實現可以擊敗這種方法。

如果得到AttributeError: '_io.TextIOWrapper' object has ，則可能是因為您使用的是Python 3.X（請參閱io-textiowrapper-object問題）。 試試這個版本：

with open('Result.txt', 'r') as f:
    aline = f.readline()
    currentid = aline.split('\t', 1)[0]
    while aline != '':
        aline = f.readline()
        nextid = aline.split('\t', 1)[0]
        if nextid != currentid:
            #do stuff
            currentid = nextid

Answer 2

我認為numpy.loadtxt（）是要走的路。 同樣，傳遞usecols參數來指定您實際上需要從文件中獲取哪些列也將是很好的。 Numpy軟件包是考慮到高性能而編寫的可靠庫。

調用loadtxt()您將返回ndarray 。

Answer 3

您可以使用itertools：

from itertools import takewhile

class EqualityChecker(object):
   def __init__(self, id):
       self.id = id

   def __call__(self, current_line):
       result = False
       current_id = current_line.split('\t')[0]

       if self.id == current_id:
           result = True

       return result


with open('hugefile.txt', 'r') as f:
   for id in ids:
       checker = EqualityChecker(id)
       for line in takewhile(checker, f.xreadlines()):
           do_stuff(line)

在外環id實際上可以從與ID不匹配的先前值的第一行得到。

Python-如何讀取文本文件中的特定行？

問題描述

3 個解決方案

解決方案1
0 2017-02-25 18:21:28

解決方案2
0 2017-02-25 18:21:42

解決方案3
0 2017-02-25 18:41:19

Python-如何讀取文本文件中的特定行？

問題描述

3 個解決方案

解決方案1 0 2017-02-25 18:21:28

解決方案2 0 2017-02-25 18:21:42

解決方案3 0 2017-02-25 18:41:19

解決方案1
0 2017-02-25 18:21:28

解決方案2
0 2017-02-25 18:21:42

解決方案3
0 2017-02-25 18:41:19