简体   繁体   English

使用Python解析大型journalctl文件以匹配关键字的有效方法

[英]Efficient way to parse large journalctl file to match keywords using Python

When parsing the journelctl file, keywords to look for are : error, boot, warning, traceback 解析journelctl文件时,要查找的关键字是:错误,启动,警告,回溯

Once I encounter the keyword, I need to increment the counter for each of the keyword and print the matching line as well. 遇到关键字后,我需要为每个关键字增加计数器打印匹配的行

So, I have tried as below; 因此,我尝试如下: reading it from a file and using Collections module - Counter object to keep track of the count along with re.findall : 从文件中读取它并使用Collections模块-Counter对象与re.findall一起跟踪计数:

import re
from collections import Counter

keywords = [" error ", " boot ", " warning ", " traceback "]

def journal_parser():
    for keyword in keywords:
        print(keyword)  # just for debugging
        word = re.findall(keyword, open("/tmp/journal_slice.log").read().lower())
        count = dict(Counter(word))
        print(count)

Above solution resolves my problem however I am looking forward for much efficient way if any. 上面的解决方案解决了我的问题,但是我期待有更多有效的方法。

Please advise. 请指教。

Here is a more efficient way: 这是一种更有效的方法:

def journal_parser(context):
    with open("/tmp/journal_slice.log") as f:
        data = f.read()
        words = re.findall(r"|".join(keywords), data, re.I) # case insensitive matching by passing the re.I flag (ignore case)
        count = dict(Counter(words))
        print(count)

I'm not sure if you still need those spaces around your keywords, depends on your data. 我不确定您是否仍然需要在关键字周围留空格,具体取决于您的数据。 But I think use of regex and extra libraries here is unnecessary imports. 但是我认为在这里使用正则表达式和额外的库是不必要的导入。

keywords = ["error ", " boot ", " warning ", " traceback "]
src = '/tmp/journal_slice.log'
def journal_parser(s, kw):
    with open(s, 'r') as f:
        data = [w for line in f for w in line.split()]
        data = [x.lower() for x in data]
        print(data)
        for k in kw:
            print(f'{k} in src happens {data.count(k)} times')
journal_parser(src, keywords)

Note that f-string formatting in print does not work in early 3.x python as well converting to lower might not be necessary - could just add all expected cases to keywords and if the file is really huge you can yield line by line in a list and do list.count() on each line, just in that case you have to track your counts 请注意,打印中的f字符串格式在3.x早期的python中不起作用,也可能无需转换为更低的格式-只需将所有预期的情况添加到关键字中,如果文件确实很大,则可以在在每一行上列出并执行list.count(),在这种情况下,您必须跟踪计数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在python中更改和解析大型XML文件的内存有效方式 - memory efficient way to change and parse a large XML file in python 在python中解析大型.csv的最有效方法? - Most efficient way to parse a large .csv in python? Biopython(或通常只是 Python):使用 gi 标识符从大型 .fasta 文件解析物种名称的最有效方法 - Biopython (or just Python in general): Most Efficient Way to Parse Species Name From A large .fasta file using gi identifier 在Python中将列表项与大型文件中的行匹配的最有效方法是什么? - What is the most efficient way to match list items to lines in a large file in Python? 导入大型数据文件的有效方法,Python - Efficient way to import large data file, Python 在 python 中读取大 txt 文件的有效方法 - Efficient way of reading large txt file in python 在Python中解析文件的最有效方法 - Most efficient way to parse a file in Python 读取/写入/解析大型文本文件的有效方法(python) - Efficient way to read/write/parse large text files (python) 在python中解析大型二进制文件的最快方法 - fastest way to parse large binary file in python 使用Python,如何删除与文本文件不匹配的关键字? - Using Python, how to remove keywords that not match with text file?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM