简体   繁体   中英

Efficient way to parse large journalctl file to match keywords using Python

When parsing the journelctl file, keywords to look for are : error, boot, warning, traceback

Once I encounter the keyword, I need to increment the counter for each of the keyword and print the matching line as well.

So, I have tried as below; reading it from a file and using Collections module - Counter object to keep track of the count along with re.findall :

import re
from collections import Counter

keywords = [" error ", " boot ", " warning ", " traceback "]

def journal_parser():
    for keyword in keywords:
        print(keyword)  # just for debugging
        word = re.findall(keyword, open("/tmp/journal_slice.log").read().lower())
        count = dict(Counter(word))
        print(count)

Above solution resolves my problem however I am looking forward for much efficient way if any.

Please advise.

Here is a more efficient way:

def journal_parser(context):
    with open("/tmp/journal_slice.log") as f:
        data = f.read()
        words = re.findall(r"|".join(keywords), data, re.I) # case insensitive matching by passing the re.I flag (ignore case)
        count = dict(Counter(words))
        print(count)

I'm not sure if you still need those spaces around your keywords, depends on your data. But I think use of regex and extra libraries here is unnecessary imports.

keywords = ["error ", " boot ", " warning ", " traceback "]
src = '/tmp/journal_slice.log'
def journal_parser(s, kw):
    with open(s, 'r') as f:
        data = [w for line in f for w in line.split()]
        data = [x.lower() for x in data]
        print(data)
        for k in kw:
            print(f'{k} in src happens {data.count(k)} times')
journal_parser(src, keywords)

Note that f-string formatting in print does not work in early 3.x python as well converting to lower might not be necessary - could just add all expected cases to keywords and if the file is really huge you can yield line by line in a list and do list.count() on each line, just in that case you have to track your counts

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM