简体   繁体   中英

Fortigate device string parsing using shlex, and split

Consider a Fortigate config like

config firewall policy
    edit 168
        set name "policy 168"
        set uuid 14435052-3097-4d70-98c7-1dd2d60e229f
        set srcintf "jimmylin__1688"
        set dstintf "port1"
        set srcaddr "address__jimmylin__10.100.168.11/32"
        set dstaddr "all"
        set action accept
        set schedule "always"
        set service "ALL"
        set comments "\"customer\": \"Jimmy Lin\""
        set nat enable
        set ippool enable
        set poolname "ippool__jimmylin__168.100.168.11"
    next
end

(This is a.conf, like a plain text file)

Say the file is living in /conf/, named firewall_policy.conf. We want to get it into python so then we can do all the data-processing things around it, So we do

>>> with open(file='/conf/firewall_policy.conf', mode='r', encoding='utf-8') as f:
>>>     lines = f.read().splitlines()

Now, let's look deeper into it.

>>> lines[2]
'        set name "policy 168"'
>>> print(lines[2])
        set name "policy 168"
>>> lines[10]
'        set service "ALL"'
>>> print(lines[10])
        set service "ALL"
>>> lines[11]
'        set comments "\\"customer\\": \\"Jimmy Lin\\""'
>>> print(lines[11])
        set comments "\"customer\": \"Jimmy Lin\""

We'll want something like

>>> def parse(string):
...     # some string-parsing things
...     return string
...
>>> parse(lines[2])
['set', 'name', '"policy 168"']
>>> parse(lines[10])
['set', 'service', '"ALL"']
>>> parse(lines[11])
['set', 'comments', '"\\"customer\\": \\"Jimmy Lin\\""'']

If we simply use

>>> def parse(string):
...    return string.split()
...

Then we will get

>>> parse(lines[2])
['set', 'name', '"policy', '168"']

Obviously that is not we want.

If we try shlex — Simple lexical analysis, a module in The Python Standard Library

>>> from shlex import split as shlex_split
>>> def parse(string):
...    return shlex_split(string)
...

Then we will get

>>> parse(lines[2])
['set', 'name', 'policy 168']

That's great, and we can join them back together by

>>> from shlex import join as shlex_join
>>> shlex_join(parse(lines[2]))
"set name 'policy 168'"

That's almost the same as the original string, but the double quotes around policy 168 became single quotes. It's acceptable though.

But when it comes to lines[10]

>>> parse(lines[10])
['set', 'name', 'ALL']
>>> shlex_join(parse(lines[10]))
'set service ALL'

The double quotes just disappear, because ALL is not a string containing space, and shlex doesn't think it is needed to add quotes around it.

I looked up the shlex documentation and it is a parameter called posix to use with.

>>> from shlex import split as shlex_split
>>> def parse(string):
...    return shlex_split(string, posix=False)
...
>>> parse(lines[2])
['set', 'service', '"ALL"']

And we can simply use the built-in join to turn it back.

>>> ' '.join(parse(lines[2]))
'set service "ALL"'

But when it comes to lines[11] , that is not the case.

>>> ' '.join(parse(lines[11]))
['set', 'comments', '"\\"', 'customer\\":', '\\"Jimmy', 'Lin\\""']

I think that is the different way of parsing a string between Forti devices and the shlex module. Forti device is seeing something like

'set comments \'"customer": "Jimmy Lin"\''

set comments ""customer": "Jimmy Lin""

and shlex is seeing something like

'set comments "\\"customer\\":  \\"Jimmy Lin\\""'

If we use .replace('\"', ''') , it will seem better

>>> lines[11].replace('\\"', '\'')
'        set comments "\'customer\': \'Jimmy Lin\'"'
>>> parse(lines[11].replace('\\"', '\''))
['set', 'comments', '"\'customer\': \'Jimmy Lin\'"']

But this is a dirty way, if there are more escape characters or more complicated nested things, I think it may fail, and the join part also not correct

>>> ' '.join(parse(lines[11].replace('\\"', '\'')))
'set comments "\'customer\': \'Jimmy Lin\'"'

Is there a solution to make these parsing process correct and clean. Am I missing something?

If the goal is: "..to get it into python so then we can do all the data-processing things around it" and you do not mind using other Python libraries, you might want to have a look at TTP and this code sample:

from ttp import ttp
import pprint

data = """
config firewall policy
    edit 168
        set name "policy 168"
        set uuid 14435052-3097-4d70-98c7-1dd2d60e229f
        set srcintf "jimmylin__1688"
        set dstintf "port1"
        set srcaddr "address__jimmylin__10.100.168.11/32"
        set dstaddr "all"
        set action accept
        set schedule "always"
        set service "ALL"
        set comments "\"customer\": \"Jimmy Lin\""
        set nat enable
        set ippool enable
        set poolname "ippool__jimmylin__168.100.168.11"
    next
    edit 200
        set name "policy 200"
        set uuid 14435052-3097-4d70-98c7-1dd2d60e2200
        set srcintf "jimmylin__16"
        set dstintf "port2"
    next
end
"""

template = """
<group name="policies**">
config firewall policy {{ _start_ }}
    <group name="{{ name }}**">
    edit {{ id }}
        set name "{{ name | ORPHRASE }}"
        set uuid {{ uuid }}
        set srcintf "{{ srcintf }}"
        set dstintf "{{ dstintf }}"
        set srcaddr "{{ srcaddr }}"
        set dstaddr "{{ dstaddr }}"
        set action {{ action }}
        set schedule "{{ schedule}}"
        set service "{{ service}}"
        set comments {{ comments | ORPHRASE | replace('"', '') }}
        set nat enable {{ nat_enabled | set(True) }}
        set ippool enable {{ ippool_enabled | set(True) }}
        set poolname "{{ poolname }}"    
    next {{ _end_ }}
    </group>
end {{ _end_ }}
</group>
"""

parser = ttp(data, template)
parser.parse()
res = parser.result()
pprint.pprint(res, width=100)

# will print:
# [[{'policies': {'policy 168': {'action': 'accept',
#                                'comments': 'customer: Jimmy Lin',
#                                'dstaddr': 'all',
#                                'dstintf': 'port1',
#                                'id': '168',
#                                'ippool_enabled': True,
#                                'nat_enabled': True,
#                                'poolname': 'ippool__jimmylin__168.100.168.11',
#                                'schedule': 'always',
#                                'service': 'ALL',
#                                'srcaddr': 'address__jimmylin__10.100.168.11/32',
#                                'srcintf': 'jimmylin__1688',
#                                'uuid': '14435052-3097-4d70-98c7-1dd2d60e229f'},
#                 'policy 200': {'dstintf': 'port2',
#                                'id': '200',
#                                'srcintf': 'jimmylin__16',
#                                'uuid': '14435052-3097-4d70-98c7-1dd2d60e2200'}}}]]

I think this is what you want. Sorry, i am not making any function for each line but parsing all in one for loop. You can insert that for loop in to a function if you want. Code commented for futher information.

# coding: utf-8

import shlex

with open(file='firewall_policy.conf', mode='r', encoding='utf-8') as f:
    lines = f.read().splitlines()

for i in range(len(lines)):
    line = lines[i].strip() # stripping each line because there are some spaces
    if line.startswith('set'): # checking whether it starts with the 'set' word
        if '\\' in line:
            lexer1 = shlex.shlex(line, posix=True) # posix stays true
            print(list(lexer1))
        else:
            lexer2 = shlex.shlex(line) # else another lexer
            lexer2.quotes = '"' # quotes are preserved
            lexer2.whitespace_split = True # using whitespaces to split
            print(list(lexer2))
    else:
        pass

Output:

['set', 'name', '"policy 168"']
['set', 'uuid', '14435052-3097-4d70-98c7-1dd2d60e229f']
['set', 'srcintf', '"jimmylin__1688"']
['set', 'dstintf', '"port1"']
['set', 'srcaddr', '"address__jimmylin__10.100.168.11/32"']
['set', 'dstaddr', '"all"']
['set', 'action', 'accept']
['set', 'schedule', '"always"']
['set', 'service', '"ALL"']
['set', 'comments', '"customer": "Jimmy Lin"']
['set', 'nat', 'enable']
['set', 'ippool', 'enable']
['set', 'poolname', '"ippool__jimmylin__168.100.168.11"']

The only problem here is the \ are gone. But, all other things are good. This is as far as i can solve using shlex . Apparently, the argument posix = True must be passed in `Python 3.8.x.

Alternative working solution with Regex only.

import re


def quote_preserved_split(line):
    """The Regex preserve outer quotes and also the escape characters"""
    return re.findall("(?:\".*?[^\\\]\"|\S)+", line)

with open(file='firewall_policy.conf', mode='r', encoding='utf-8') as f:
    lines = f.read().splitlines()

for i in range(len(lines)):
    line = lines[i].strip()
    if line.startswith('set'):
        print(quote_preserved_split(line))

Output:

['set', 'name', '"policy 168"']
['set', 'uuid', '14435052-3097-4d70-98c7-1dd2d60e229f']
['set', 'srcintf', '"jimmylin__1688"']
['set', 'dstintf', '"port1"']
['set', 'srcaddr', '"address__jimmylin__10.100.168.11/32"']
['set', 'dstaddr', '"all"']
['set', 'action', 'accept']
['set', 'schedule', '"always"']
['set', 'service', '"ALL"']
['set', 'comments', '"\\"customer\\": \\"Jimmy Lin\\""']
['set', 'nat', 'enable']
['set', 'ippool', 'enable']
['set', 'poolname', '"ippool__jimmylin__168.100.168.11"']

Joining each line by using join() you get:

set name "policy 168"
set uuid 14435052-3097-4d70-98c7-1dd2d60e229f
set srcintf "jimmylin__1688"
set dstintf "port1"
set srcaddr "address__jimmylin__10.100.168.11/32"
set dstaddr "all"
set action accept
set schedule "always"
set service "ALL"
set comments "\"customer\": \"Jimmy Lin\""
set nat enable
set ippool enable
set poolname "ippool__jimmylin__168.100.168.11"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM