简体   繁体   English

解析 txt 文件并将数据存储到字典中

[英]Parse a txt file and store data into a dictionary

I have a set of data that I would like to extract from a txt file and stored in a specific format.我有一组数据,我想从 txt 文件中提取并以特定格式存储。 The data is is currently in a txt file like so:数据当前位于 txt 文件中,如下所示:

set firewall family inet filter INBOUND term TEST from source-address 1.1.1.1/32
set firewall family inet filter INBOUND term TEST from destination-prefix-list test-list
set firewall family inet filter INBOUND term TEST from protocol udp
set firewall family inet filter INBOUND term TEST from destination-port 53
set firewall family inet filter INBOUND term TEST then accept
set firewall family inet filter PROD term LAN from source-address 4.4.4.4/32
set firewall family inet filter PROD term LAN from source-address 5.5.5.5/32
set firewall family inet filter PROD term LAN from protocol tcp
set firewall family inet filter PROD term LAN from destination-port 443
set firewall family inet filter PROD term LAN then deny

I would like the data to be structured to where each rule has their respective options placed into dictionary and appended to a list.我希望将数据结构化为每个规则将其各自的选项放入字典并附加到列表的位置。 For example:例如:

Expected Output预期产出

[{'Filter': 'INBOUND', 'Term': 'TEST', 'SourceIP': '1.1.1.1/32', 'DestinationList': 'test-list', 'Protocol': 'udp', 'DestinationPort': '53', 'Action': 'accept},
{'Filter': 'PROD', 'Term': 'LAN', 'SourceIP': ['4.4.4.4/32','5.5.5.5/32'], 'Protocol': 'tcp', 'DestinationPort': '443', 'Action': 'deny'}]

As you can see there may be instances where a certain trait does not exist for a rule.如您所见,可能存在某些规则不存在某个特征的情况。 I would also have to add multiple IP addresses as a value.我还必须添加多个 IP 地址作为值。 I am currently using Regex to match the items in the txt file.我目前正在使用正则表达式来匹配 txt 文件中的项目。 My thought was to iterate through each line in the file, find any matches and add them as a key-value pair to a dictionary.我的想法是遍历文件中的每一行,找到任何匹配项并将它们作为键值对添加到字典中。

Once I get an "accept" or "deny", that should signal the end of the rule and I will append the dictionary to the list, clear the dictionary and start the process with the next rule.一旦我得到“接受”或“拒绝”,这应该表示规则结束,我会将字典附加到列表中,清除字典并使用下一条规则开始流程。 However this does not seem to be working as intended.然而,这似乎没有按预期工作。 My Regex seems fine but I cant seem to figure out the logic when processing each line, adding multiple values to a value list, and adding values to the dictionary.我的正则表达式看起来不错,但在处理每一行、将多个值添加到值列表以及将值添加到字典时,我似乎无法弄清楚逻辑。 Here is my code below这是我下面的代码

import re

data_file = "sample_data.txt"

##### REGEX PATTERNS #####

filter_re = r'(?<=filter\s)(.*)(?=\sterm.)'
term_re = r'(?<=term\s)(.*)(?=\sfrom|\sthen)'
protocol_re = r'(?<=protocol\s)(.*)'
dest_port_re = r'(?<=destination-port\s)(.*)'
source_port_re = r'(?<=from\ssource-port\s)(.*)'
prefix_source_re = r'(?<=from\ssource-prefix-list\s)(.*)'
prefix_dest_re = r'(?<=from\sdestination-prefix-list\s)(.*)'
source_addr_re = r'(?<=source-address\s)(.*)'
dest_addr_re = r'(?<=destination-address\s)(.*)'
action_re = r'(?<=then\s)(deny|accept)'

pattern_list = [filter_re, term_re, source_addr_re, prefix_source_re, source_port_re, dest_addr_re, prefix_dest_re, dest_port_re, protocol_re, action_re]

pattern_headers = ["Filter", "Term", "Source_Address", "Source_Prefix_List", "Source_Port", "Destination_Address," "Destination_Prefix_List", "Destination_Port", "Protocol", "Action"]

final_list = []

def open_file(file):
    rule_dict = {}
    with open(file, 'r') as f:
        line = f.readline()
        while line:
            line = f.readline().strip()
            for header, pattern in zip(pattern_headers,pattern_list):
                match = re.findall(pattern, line)
                if len(match) != 0:
                    if header != 'accept' or header != 'deny':
                        rule_dict[header] = match[0]
                    else:
                        rule_dict[header] = match[0]
                        final.append(rule_dict)
                        rule_dict = {}
    print(rule_dict)
    print(final_list)

The final list is empty and the rule_dict only contains the final rule from the text file not the both of the rulesets.最终列表为空,并且 rule_dict 仅包含来自文本文件的最终规则,而不是两个规则集。 Any guidance would be greatly appreciated.任何指导将不胜感激。

There are few little mistakes in your code:您的代码中有几个小错误:

  • in your while loop f.readline() needs to be at the end, otherwise you already begin in line 2 (readline called twice before doing anything)在你的 while 循环中f.readline()需要在最后,否则你已经从第 2 行开始(在做任何事情之前调用了两次 readline)
  • final_list has to be defined in your function and also used correctly then (instead of only "final" final_list必须在您的函数中定义并正确使用(而不仅仅是“final”
  • if header != 'accept' or header != 'deny': : here needs to be an and . if header != 'accept' or header != 'deny': : 这里需要一个and One of them is always True, so the else part never gets executed.其中一个总是 True,所以else部分永远不会被执行。
  • you need to check the match for accept|deny , not the header您需要检查accept|deny的匹配,而不是header
  • for example in Source_IP you want to have a list with all IP's you find.例如,在Source_IP中,您希望有一个包含您找到的所有 IP 的列表。 The way you do it, the value would always be updated and only the last found IP will be in your final_list你这样做的方式,值总是会更新,只有最后找到的 IP 会在你的final_list
def open_file(file):
    final_list = []
    rule_dict = {}
    with open(file) as f:
        line = f.readline()

        while line:
            line = line.strip()
            for header, pattern in zip(pattern_headers, pattern_list):
                match = re.findall(pattern, line)
                if len(match) != 0:                  
                    if (match[0] != "accept") and (match[0] != "deny"):
                        rule_dict.setdefault(header, set()).add(match[0])
                    else:
                        rule_dict.setdefault(header, set()).add(match[0])

                        #adjust values of dict to list (if multiple values) or just a value (instead of set) before appending to list
                        final_list.append({k:(list(v) if len(v)>1 else v.pop()) for k,v in rule_dict.items()})
                        rule_dict = {}
            line = f.readline()
        
    print(f"{rule_dict=}")
    print(f"{final_list=}")
    
open_file(data_file)

Output:输出:

rule_dict={}
final_list=[
    {
        'Filter': 'INBOUND', 
         'Term': 'TEST', 
         'Source_Address': '1.1.1.1/32', 
         'Destination_Prefix_List': 'test-list', 
         'Protocol': 'udp', 'Destination_Port': '53', 
         'Action': 'accept'
    }, 
    {
        'Filter': 'PROD', 
         'Term': 'LAN', 
         'Source_Address': ['5.5.5.5/32', '4.4.4.4/32'], 
         'Protocol': 'tcp', 
         'Destination_Port': '443', 
         'Action': 'deny'
    }
]

There are few things that i have change in your code:我在您的代码中有几处更改:

  • When " accept " and " deny " found in action then append final_dict in final_list and empty final_dict当在行动中找到“接受”和“拒绝”时,将final_dict附加到final_list并清空final_dict
  • allow to add more than one SourceIP - for that create list in value of SourceIP when more than SourceIP get允许添加多个SourceIP - 当获得多个SourceIP时,在 SourceIP 的值中创建列表

import re
data_file = "/home/hiraltalsaniya/Documents/Hiral/test"

filter_re = r'(?<=filter\s)(.*)(?=\sterm.)'
term_re = r'(?<=term\s)(.*)(?=\sfrom|\sthen)'
protocol_re = r'(?<=protocol\s)(.*)'
dest_port_re = r'(?<=destination-port\s)(.*)'
source_port_re = r'(?<=from\ssource-port\s)(.*)'
prefix_source_re = r'(?<=from\ssource-prefix-list\s)(.*)'
prefix_dest_re = r'(?<=from\sdestination-prefix-list\s)(.*)'
source_addr_re = r'(?<=source-address\s)(.*)'
dest_addr_re = r'(?<=destination-address\s)(.*)'
action_re = r'(?<=then\s)(deny|accept)'

pattern_list = [filter_re, term_re, source_addr_re, prefix_source_re, source_port_re, dest_addr_re, prefix_dest_re,
                dest_port_re, protocol_re, action_re]

pattern_headers = ["Filter", "Term", "SourceIP", "Source_Prefix_List", "Source_Port", "Destination_Address",
                   "DestinationList", "Destination_Port", "Protocol", "Action"]

def open_file(file):
    final_dict: dict = dict()
    final_list: list = list()
    with open(file) as f:
        for line in f:
            for header, pattern in zip(pattern_headers, pattern_list):
                match = re.search(pattern, line)
                if match:
                    # check if accept or deny  it means the end of the rule then empty dictionary
                    if str(match.group()) == "accept" or match.group() == "deny":
                        final_list.append(final_dict)
                        final_dict: dict = dict()
                    # if more than one SourceIP then create list of SourceIP
                    elif header == "SourceIP" and header in final_dict.keys():
                        final_dict[header] = [final_dict[header]]
                        final_dict.setdefault(header, final_dict[header]).append(match.group())
                    else:
                        final_dict[header] = match.group()
    print("final_list=", final_list)
open_file(data_file)

Output:输出:

final_list= [{'Filter': 'INBOUND', 
              'Term': 'TEST', 
              'SourceIP': '1.1.1.1/32', 
              'DestinationList': 'test-list', 
              'Protocol': 'udp', 
              'Destination_Port': '53'
            }, 
            {'Filter': 'PROD', 
             'Term': 'LAN', 
             'SourceIP': ['4.4.4.4/32', '5.5.5.5/32'], 
             'Protocol': 'tcp', 
             'Destination_Port': '443'
            }]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM