簡體   English   中英

使用正則表達式將多行腳本輸出轉換為字典

[英]Converting multi-line script output to dictionary using regex

我得到以下腳本輸出:

***************************************************
[g4u2680c]: searching for domains
---------------------------------------------------
host =   g4u2680c.houston.example.com
         ipaddr = [16.208.16.72]
         VLAN   = [352]
         Gateway= [16.208.16.1]
         Subnet = [255.255.248.0]
         Subnet = [255.255.248.0]
         Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]

host =   g4u2680c.houston.example.com
         ipaddr = [16.208.16.72]
         VLAN   = [352]
         Gateway= [16.208.16.1]
         Subnet = [255.255.248.0]
         Subnet = [255.255.248.0]
         Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]

* script completed Mon Jun 15 06:13:14 UTC 2015 **
* sleeping 30 to avoid DOS on dns via a loop **

我需要將2個主機列表提取到字典中,且不包含方括號。

這是我的代碼:

#!/bin/env python

import re

text="""*************************************************** 
[g4u2680c]: searching for domains
---------------------------------------------------
host =   g4u2680c.houston.example.com
         ipaddr = [16.208.16.72]
         VLAN   = [352]
         Gateway= [16.208.16.1]
         Subnet = [255.255.248.0]
         Subnet = [255.255.248.0]
         Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]

host =   g4u2680c.houston.example.com
         ipaddr = [16.208.16.72]
         VLAN   = [352]
         Gateway= [16.208.16.1]
         Subnet = [255.255.248.0]
         Subnet = [255.255.248.0]
         Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]

* script completed Mon Jun 15 06:13:14 UTC 2015 **
* sleeping 30 to avoid DOS on dns via a loop **
***************************************************
"""

seq = re.compile(r"host.+?\n\n",re.DOTALL)

a=seq.findall(text)

matches = re.findall(r'\w.+=.+', a[0])

matches = [m.split('=', 1) for m in matches]

matches = [ [m[0].strip().lower(), m[1].strip().lower()] for m in matches]

#should have function with regular expression to remove bracket here

d = dict(matches)

print d

到目前為止,我得到的第一位主持人是:

{'subnet': '[255.255.248.0]', 'vlan': '[352]', 'ipaddr': '[16.208.16.72]', 'cluster': '[g4u2679c g4u2680c g9u1484c g9u1485c]', 'host': 'g4u2680c.houston.example.com', 'gateway': '[16.208.16.1]'}

我需要幫助來查找正則表達式以刪除括號,因為字典中的值包含帶有和不帶有括號的數據。

或者,如果有更好,更簡單的方法將原始腳本輸出轉換成字典。

您可以簡單地使用re.findalldict

>>> dict([(i,j.strip('[]')) for i,j in re.findall(r'(\w+)\s*=\s*(.+)',text)])
{'Subnet': '255.255.248.0', 'VLAN': '352', 'ipaddr': '16.208.16.72', 'Cluster': 'g4u2679c g4u2680c g9u1484c g9u1485c', 'host': 'g4u2680c.houston.example.com', 'Gateway': '16.208.16.1'}

關於括號,您可以通過str.strip方法將其刪除。

您可以使用: (\\w+)\\s*=\\s*\\[?([^\\n\\]]+)\\]?

演示

import re
p = re.compile(ur'(\w+)\s*=\s*\[?([^\n\]]+)\]?', re.MULTILINE)
test_str = u"host =   g4u2680c.houston.example.com\n         ipaddr = [16.208.16.72]\n         VLAN   = [352]\n         Gateway= [16.208.16.1]\n         Subnet = [255.255.248.0]\n         Subnet = [255.255.248.0]\n         Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]\n\nhost =   g4u2680c.houston.example.com\n         ipaddr = [16.208.16.72]\n         VLAN   = [352]\n         Gateway= [16.208.16.1]\n         Subnet = [255.255.248.0]\n         Subnet = [255.255.248.0]\n         Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]\n"

re.findall(p, test_str)

您可以嘗試一下。

matches = [m.replace('[','').replace(']','').split('=', 1) for m in matches]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM