简体   繁体   中英

parsing specific strings from list items in python

I have the following code in python, which contains log messages to debug SSH.

for log_item in ssh_log:
   print(log_item.rstrip())

#will show ...
2022-04-06 01:55:15,085 10.x Remote version/idstring: SSH-2.0-ConfD-4.3.11.4
2022-04-06 01:55:15,085 20.x Connected (version 2.0, client ConfD-4.3.11.4)
2022-04-06 01:55:15,161 10.x kex algos:['diffie-hellman-group14-sha1'] server key:['ssh-rsa']
...

What is the approach to get the values in bold assign my variables, maybe some regex as part of the for loop or something else to get the following:

idstring = SSH-2.0-ConfD-4.3.11.4
kex_algos = ['diffie-hellman-group14-sha1']
key_type = ['ssh-rsa']

If all the data is in the same format as the data given here, You can use the following regex:

import re
a = """
2022-04-06 01:55:15,085 10.x Remote version/idstring: SSH-2.0-ConfD-4.3.11.4
2022-04-06 01:55:15,085 20.x Connected (version 2.0, client ConfD-4.3.11.4)
2022-04-06 01:55:15,161 10.x kex algos:['diffie-hellman-group14-sha1'] server key:['ssh-rsa']"""

idstring = re.findall("idstring: (.*)", a)[0] # Remove zero to get a list if 
                                              # multiple values are present
print(idstring)
kex_algos = re.findall("algos:\['(.*)'\] ", a)
print(kex_algos)
key_type = re.findall("key:\['(.*)'\]", a)
print(key_type)

Output:

'SSH-2.0-ConfD-4.3.11.4'
['diffie-hellman-group14-sha1']
['ssh-rsa']

Solution without regex. See comments inline below.

for log_item in ssh_log:
    line = log_item.rstrip()
    if 'idstring' in line:
        print('idstring = ',line.split(':')[-1]) #Pick last value after ':'
    if 'kex algos' in line:
        print('kex algos = ', line[line.find('['):line.find(']')+1]) #find between first set of square brackets.
    if 'key:' in line:
        key = line.split('key:')[1] #Get values after 'key:'
        print('key_type = ', key)

You can update prints to variable assignments if this is what you required.

You can also use ttp template to parse your data if your data has similar structure.

from ttp import ttp
import json

with open("log.txt") as f:
    data_to_parse = f.read()

ttp_template = """
{{ignore}} {{ignore}} {{ignore}} {{ignore}} version/idstring: {{version_id_string}}
{{ignore}} {{ignore}} {{ignore}} {{ignore}} algos:{{key_algos}} server key:{{key_type}}
"""

parser = ttp(data=data_to_parse, template=ttp_template)
parser.parse()

# print result in JSON format
results = parser.result(format='json')[0]
# print(results)

result = json.loads(results)

# print(result)

for i in result:
    print(i["key_algos"])
    print(i["key_type"])
    print(i["version_id_string"])

The output is:

['diffie-hellman-group14-sha1']
['ssh-rsa']
SSH-2.0-ConfD-4.3.11.4

With the 3 lines of sample data from the original question in a file, one could take this approach:

import re

with open('ssh.log') as sshlog:
    for line in map(str.strip, sshlog):
        _, _, _, kw, *rem = line.split()
        match kw:
            case 'Remote':
                print(f'ID string = {rem[-1]}')
            case 'kex':
                m = re.findall("(?<=\[').+?(?='\])", line)
                print(f'algos = {m[0]}')
                print(f'type = {m[1]}')
            case _:
                pass

The assumption here is that only lines with either of the keywords 'Remote' or 'kex' are of interest.

Output:

ID string = SSH-2.0-ConfD-4.3.11.4
algos = diffie-hellman-group14-sha1
type = ssh-rsa

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM