有效讀取文件中的某一行

Question

在 Python 中遇到了一些不同的讀取文件的方法，我想知道哪種方法最快。

例如讀取文件的最后一行，可以這樣做

input_file = open('mytext.txt', 'r')
lastLine = ""
  for line in input_file:
    lastLine = line

print lastLine # This is the last line

或者

fileHandle = open('mytext.txt', 'r')
lineList = fileHandle.readlines()
print lineList[-1] #This is the last line

我假設對於這種特殊情況，這可能與討論效率無關...

題：

1.選擇隨機線哪種方法更快

2.我們可以在 Python 中處理像“SEEK”這樣的概念嗎（如果是這樣會更快嗎？）

Answer 1

如果您不需要均勻分布（即可以選擇某條線的機會並不對所有線均等）和/或如果您的線長度都大致相同，則選擇隨機線的問題可以簡化為：

確定文件的大小（以字節為單位）
尋找隨機位置
如果有，則搜索最后一個換行符（如果沒有前一行，則可能沒有）
選取直到下一個換行符或文件末尾的所有文本，以先到者為准。

對於（2），您對需要向后搜索多遠才能找到前一個換行符進行有根據的猜測。 如果您可以判斷一行平均為n個字節，那么您可以一步讀取前n個字節。

Answer 2

幾天前我遇到了這個問題，我使用了這個解決方案。 我的解決方案類似於@Frerich Raabe 的解決方案，但沒有隨機性，只有邏輯:)

def get_last_line(f):
    """ f is a file object in read mode, I just extract the algorithm from a bigger function """
    tries = 0
    offs = -512

    while tries < 5:
        # Put the cursor at n*512nth character before the end.
        # If we reach the max fsize, it puts the cursor at the beginning (fsize * -1 means move the cursor of -fsize from the end)
        f.seek(max(fsize * -1, offs), 2)
        lines = f.readlines()
        if len(lines) > 1:   # If there's more than 1 lines found, then we have the last complete line
            return lines[-1]  # Returns the last complete line
        offs *= 2
        tries += 1

    raise ValueError("No end line found, after 5 tries (Your file may has only 1 line or the last line is longer than %s characters)" % offs)

如果文件也有一行（最后一行非常長），則tries計數器避免被阻塞。 該算法嘗試從最后 512 個字符中獲取最后一行，然后是 1024、2048 ......如果在第th迭代時仍然沒有完整的行，則停止。

有效讀取文件中的某一行

問題描述

2 個解決方案

解決方案1
1 已采納 2013-08-26 11:03:29

解決方案2
0 2013-08-26 11:14:28

有效讀取文件中的某一行

問題描述

2 個解決方案

解決方案1 1 已采納 2013-08-26 11:03:29

解決方案2 0 2013-08-26 11:14:28

解決方案1
1 已采納 2013-08-26 11:03:29

解決方案2
0 2013-08-26 11:14:28