如何在 Python 中跟蹤日志文件？

Question

我想在 Python 中使用 tail -F 或類似的輸出而不阻塞或鎖定。 我在這里找到了一些非常古老的代碼，但我認為現在必須有更好的方法或庫來做同樣的事情。 有人知道嗎？

理想情況下，我每次需要更多數據時都可以調用類似tail.getNewData()的東西。

Answer 1

非阻塞

如果您在 linux 上（因為 windows 不支持在文件上調用 select），您可以將 subprocess 模塊與 select 模塊一起使用。

import time
import subprocess
import select

f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)

while True:
    if p.poll(1):
        print f.stdout.readline()
    time.sleep(1)

這會輪詢輸出管道以獲取新數據並在可用時打印它。 通常 time.sleep time.sleep(1)和print f.stdout.readline()將被替換為有用的代碼。

阻塞

您可以使用 subprocess 模塊而無需額外的 select 模塊調用。

import subprocess
f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
    line = f.stdout.readline()
    print line

這也將在添加新行時打印它們，但它會阻塞直到 tail 程序關閉，可能使用f.kill() 。

Answer 2

使用sh 模塊（pip install sh）：

from sh import tail
# runs forever
for line in tail("-f", "/var/log/some_log_file.log", _iter=True):
    print(line)

[更新]

由於_iter =True 的 sh.tail 是一個生成器，您可以：

import sh
tail = sh.tail("-f", "/var/log/some_log_file.log", _iter=True)

然后你可以“getNewData”：

new_data = tail.next()

請注意，如果尾部緩沖區為空，它將阻塞，直到有更多數據（根據您的問題，不清楚在這種情況下您要做什么）。

[更新]

如果您將 -f 替換為 -F，這將有效，但在 Python 中它將被鎖定。 如果可能的話，我會對擁有一個可以在需要時調用以獲取新數據的函數更感興趣。 – 伊萊

容器生成器將尾調用放置在 while True 循環中並捕獲最終的 I/O 異常將具有與 -F 幾乎相同的效果。

def tail_F(some_file):
    while True:
        try:
            for line in sh.tail("-f", some_file, _iter=True):
                yield line
        except sh.ErrorReturnCode_1:
            yield None

如果文件變得不可訪問，生成器將返回 None。 但是，如果文件可訪問，它仍然會阻塞，直到有新數據。 我還不清楚在這種情況下你想做什么。

Raymond Hettinger 的方法似乎相當不錯：

def tail_F(some_file):
    first_call = True
    while True:
        try:
            with open(some_file) as input:
                if first_call:
                    input.seek(0, 2)
                    first_call = False
                latest_data = input.read()
                while True:
                    if '\n' not in latest_data:
                        latest_data += input.read()
                        if '\n' not in latest_data:
                            yield ''
                            if not os.path.isfile(some_file):
                                break
                            continue
                    latest_lines = latest_data.split('\n')
                    if latest_data[-1] != '\n':
                        latest_data = latest_lines[-1]
                    else:
                        latest_data = input.read()
                    for line in latest_lines[:-1]:
                        yield line + '\n'
        except IOError:
            yield ''

如果文件變得不可訪問或沒有新數據，此生成器將返回 ''。

[更新]

倒數第二個答案會在數據用完時繞到文件頂部。 – 伊萊

我認為第二個將在尾部進程結束時輸出最后十行，其中-f是每當出現 I/O 錯誤時。 tail --follow --retry行為在我能想到的類 unix 環境中的大多數情況下與此相差不遠。

也許如果你更新你的問題來解釋你的真正目標是什么（你想模仿 tail --retry 的原因），你會得到一個更好的答案。

最后一個答案實際上並不跟隨尾部，而只是讀取運行時可用的內容。 – 伊萊

當然，tail 默認會顯示最后 10 行...您可以使用 file.seek 將文件指針定位在文件末尾，我將留一個正確的實現作為練習給讀者。

恕我直言，file.read() 方法比基於子流程的解決方案要優雅得多。

Answer 3

使用非阻塞 readline() 的純 Pythonic 解決方案

我正在調整 Ijaz Ahmad Khan 的答案，使其僅在完全編寫時才產生行（行以換行符結尾）給出了一個沒有外部依賴的 pythonic 解決方案：

import time

def follow(file, sleep_sec=0.1) -> Iterator[str]:
    """ Yield each line from a file as they are written.
    `sleep_sec` is the time to sleep after empty reads. """
    line = ''
    while True:
        tmp = file.readline()
        if tmp is not None:
            line += tmp
            if line.endswith("\n"):
                yield line
                line = ''
        elif sleep_sec:
            time.sleep(sleep_sec)


if __name__ == '__main__':
    with open("test.txt", 'r') as file:
        for line in follow(file):
            print(line, end='')

Answer 4

實際上， tail -f文件的唯一可移植方式似乎是從文件中讀取並在read返回 0 后重試（在sleep之后）。各種平台上的tail實用程序使用特定於平台的技巧（例如 BSD 上的kqueue ) 以有效地永遠跟蹤文件而無需sleep 。

因此，純粹在 Python 中實現一個好的tail -f可能不是一個好主意，因為您必須使用最小公分母實現（不訴諸平台特定的黑客）。 使用簡單的subprocess進程打開tail -f並在單獨的線程中遍歷行，您可以輕松地在 Python 中實現非阻塞tail操作。

示例實現：

import threading, Queue, subprocess
tailq = Queue.Queue(maxsize=10) # buffer at most 100 lines

def tail_forever(fn):
    p = subprocess.Popen(["tail", "-f", fn], stdout=subprocess.PIPE)
    while 1:
        line = p.stdout.readline()
        tailq.put(line)
        if not line:
            break

threading.Thread(target=tail_forever, args=(fn,)).start()

print tailq.get() # blocks
print tailq.get_nowait() # throws Queue.Empty if there are no lines to read

Answer 5

所以，這來得很晚，但我又遇到了同樣的問題，現在有一個更好的解決方案。 只需使用pygtail ：

Pygtail 讀取尚未讀取的日志文件行。 它甚至會處理已輪換的日志文件。 基於 logcheck 的 logtail2 ( http://logcheck.org )

Answer 6

所有使用 tail -f 的答案都不是 Pythonic。

這是pythonic方式：（不使用外部工具或庫）

def follow(thefile):
     while True:
        line = thefile.readline()
        if not line or not line.endswith('\n'):
            time.sleep(0.1)
            continue
        yield line



if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print(line, end='')

Answer 7

理想情況下，我有類似 tail.getNewData() 的東西，每次我想要更多數據時我都可以調用它

我們已經有了一個，它非常好。 只要您需要更多數據，只需調用f.read() 即可。 它將從上次讀取停止的地方開始讀取，並將讀取數據流的末尾：

f = open('somefile.log')
p = 0
while True:
    f.seek(p)
    latest_data = f.read()
    p = f.tell()
    if latest_data:
        print latest_data
        print str(p).center(10).center(80, '=')

要逐行閱讀，請使用f.readline() 。 有時，正在讀取的文件將以部分讀取的行結束。 使用f.tell()查找當前文件位置並使用f.seek()將文件指針移回不完整行的開頭來處理這種情況。 有關工作代碼，請參閱此 ActiveState 配方。

Answer 8

您可以使用“tailer”庫： https ://pypi.python.org/pypi/tailer/

它可以選擇獲取最后幾行：

# Get the last 3 lines of the file
tailer.tail(open('test.txt'), 3)
# ['Line 9', 'Line 10', 'Line 11']

它也可以跟隨一個文件：

# Follow the file as it grows
for line in tailer.follow(open('test.txt')):
    print line

如果一個人想要類似尾巴的行為，那似乎是一個不錯的選擇。

Answer 9

另一種選擇是tailhead庫，它提供 Python 版本的tail和head實用程序以及可在您自己的模塊中使用的 API。

最初基於tailer模塊，它的主要優點是能夠通過路徑跟蹤文件，即它可以處理文件重新創建時的情況。 此外，它還針對各種邊緣情況進行了一些錯誤修復。

Answer 10

Python 是“包括電池” - 它有一個很好的解決方案： https ://pypi.python.org/pypi/pygtail

讀取尚未讀取的日志文件行。 記住上次完成的地方，並從那里繼續。

import sys
from pygtail import Pygtail

for line in Pygtail("some.log"):
    sys.stdout.write(line)

Answer 11

您也可以使用“AWK”命令。
查看更多信息： http ://www.unix.com/shell-programming-scripting/41734-how-print-specific-lines-awk.html
awk 可用於尾隨文件中的最后一行、最后幾行或任何一行。
這可以從 python 調用。

Answer 12

如果您在 linux 上，您可以通過以下方式在 python 中實現非阻塞實現。

import subprocess
subprocess.call('xterm -title log -hold -e \"tail -f filename\"&', shell=True, executable='/bin/csh')
print "Done"

Answer 13

# -*- coding:utf-8 -*-
import sys
import time


class Tail():
    def __init__(self, file_name, callback=sys.stdout.write):
        self.file_name = file_name
        self.callback = callback

    def follow(self, n=10):
        try:
            # 打開文件
            with open(self.file_name, 'r', encoding='UTF-8') as f:
            # with open(self.file_name,'rb') as f:
                self._file = f
                self._file.seek(0, 2)
                # 存儲文件的字符長度
                self.file_length = self._file.tell()
                # 打印最后10行
                self.showLastLine(n)
                # 持續讀文件 打印增量
                while True:
                    line = self._file.readline()
                    if line:
                        self.callback(line)
                    time.sleep(1)
        except Exception as e:
            print('打開文件失敗，囧，看看文件是不是不存在，或者權限有問題')
            print(e)

    def showLastLine(self, n):
        # 一行大概100個吧 這個數改成1或者1000都行
        len_line = 100
        # n默認是10，也可以follow的參數傳進來
        read_len = len_line * n
        # 用last_lines存儲最后要處理的內容
        while True:
            # 如果要讀取的1000個字符，大於之前存儲的文件長度
            # 讀完文件，直接break
            if read_len > self.file_length:
                self._file.seek(0)
                last_lines = self._file.read().split('\n')[-n:]
                break
            # 先讀1000個 然后判斷1000個字符里換行符的數量
            self._file.seek(-read_len, 2)
            last_words = self._file.read(read_len)
            # count是換行符的數量
            count = last_words.count('\n')

            if count >= n:
                # 換行符數量大於10 很好處理，直接讀取
                last_lines = last_words.split('\n')[-n:]
                break
            # 換行符不夠10個
            else:
                # break
                # 不夠十行
                # 如果一個換行符也沒有，那么我們就認為一行大概是100個
                if count == 0:

                    len_perline = read_len
                # 如果有4個換行符，我們認為每行大概有250個字符
                else:
                    len_perline = read_len / count
                # 要讀取的長度變為2500，繼續重新判斷
                read_len = len_perline * n
        for line in last_lines:
            self.callback(line + '\n')


if __name__ == '__main__':
    py_tail = Tail('test.txt')
    py_tail.follow(1)

Answer 14

來自 pypi app tailread 的簡單尾部函數

您也可以通過 pip install tailread 使用它

推薦用於大文件的尾部訪問。

from io import BufferedReader


def readlines(bytesio, batch_size=1024, keepends=True, **encoding_kwargs):
    '''bytesio: file path or BufferedReader
       batch_size: size to be processed
    '''
    path = None
    
    if isinstance(bytesio, str):
        path = bytesio
        bytesio = open(path, 'rb')
    elif not isinstance(bytesio, BufferedReader):
        raise TypeError('The first argument to readlines must be a file path or a BufferedReader')

    bytesio.seek(0, 2)
    end = bytesio.tell()

    buf = b""
    for p in reversed(range(0, end, batch_size)):
        bytesio.seek(p)
        lines = []
        remain = min(end-p, batch_size)
        while remain > 0:
            line = bytesio.readline()[:remain]
            lines.append(line)
            remain -= len(line)

        cut, *parsed = lines
        for line in reversed(parsed):
            if buf:
                line += buf
                buf = b""
            if encoding_kwargs:
                line = line.decode(**encoding_kwargs)
            yield from reversed(line.splitlines(keepends))
        buf = cut + buf
    
    if path:
        bytesio.close()

    if encoding_kwargs:
        buf = buf.decode(**encoding_kwargs)
    yield from reversed(buf.splitlines(keepends))


for line in readlines('access.log', encoding='utf-8', errors='replace'):
    print(line)
    if 'line 8' in line:
        break

# line 11
# line 10
# line 9
# line 8

如何在 Python 中跟蹤日志文件？

問題描述

14 個解決方案

解決方案1
78 2012-09-21 02:09:27

非阻塞

阻塞

解決方案2
52

解決方案3
29 2019-01-19 00:55:41

使用非阻塞 readline() 的純 Pythonic 解決方案

解決方案4
27 2012-09-21 01:59:54

解決方案5
15 已采納 2015-08-04 01:56:33

解決方案6
15 2018-11-02 15:10:47

解決方案7
12 2012-09-21 02:16:53

解決方案8
7 2016-02-02 10:12:42

解決方案9
5 2016-02-23 06:59:53

解決方案10
1 2017-03-22 15:40:50

解決方案11
0 2014-02-08 17:59:05

解決方案12
0 2015-10-08 06:46:01

解決方案13
0 2021-09-17 07:26:20

解決方案14
0 2022-06-08 04:01:18

如何在 Python 中跟蹤日志文件？

問題描述

14 個解決方案

解決方案1 78 2012-09-21 02:09:27

非阻塞

阻塞

解決方案2 52

解決方案3 29 2019-01-19 00:55:41

使用非阻塞 readline() 的純 Pythonic 解決方案

解決方案4 27 2012-09-21 01:59:54

解決方案5 15 已采納 2015-08-04 01:56:33

解決方案6 15 2018-11-02 15:10:47

解決方案7 12 2012-09-21 02:16:53

解決方案8 7 2016-02-02 10:12:42

解決方案9 5 2016-02-23 06:59:53

解決方案10 1 2017-03-22 15:40:50

解決方案11 0 2014-02-08 17:59:05

解決方案12 0 2015-10-08 06:46:01

解決方案13 0 2021-09-17 07:26:20

解決方案14 0 2022-06-08 04:01:18

解決方案1
78 2012-09-21 02:09:27

解決方案2
52

解決方案3
29 2019-01-19 00:55:41

解決方案4
27 2012-09-21 01:59:54

解決方案5
15 已采納 2015-08-04 01:56:33

解決方案6
15 2018-11-02 15:10:47

解決方案7
12 2012-09-21 02:16:53

解決方案8
7 2016-02-02 10:12:42

解決方案9
5 2016-02-23 06:59:53

解決方案10
1 2017-03-22 15:40:50

解決方案11
0 2014-02-08 17:59:05

解決方案12
0 2015-10-08 06:46:01

解決方案13
0 2021-09-17 07:26:20

解決方案14
0 2022-06-08 04:01:18