简体   繁体   English

将“zpool status”的 output 解析为字典的 Pythonic 方式 - 不一致的 output 导致代码混乱

[英]Pythonic way to parse the output of `zpool status` into a dictionary - inconsistent output causing messy code

I'm trying to parse the output of the zfs command zpool status , which gives me an output like so:我正在尝试解析 zfs 命令zpool status的 output ,这给了我一个 output ,如下所示:

  pool: tank
 state: ONLINE
  scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

My goal is to convert this output to a dictionary, like so我的目标是将此 output 转换为字典,就像这样

{
    'pool': 'tank',
    'state': 'ONLINE',
    'scan': 'resilvered 35.5G in...',
    'config': 'NAME        STATE     READ WRITE CKSUM...',
    'errors': 'No known data errors'
}

I'm experiencing two problems that are causing me to write messy code:我遇到了两个导致我编写混乱代码的问题:

  1. Not every line, such as the scan line, is displayed every time the command is run, and additional lines are possible that are not displayed above并非每次运行命令时都会显示每一行,例如scan线,并且可能会出现上面未显示的其他行
  2. The config line has a few newlines before its output, which makes splitting difficult config行在其 output 之前有几个换行符,这使得拆分变得困难

I've tried a few different ways of doing this, but my code gets bogged-down with a bunch of conditionals - and being python I figured there must be a cleaner way.我已经尝试了几种不同的方法,但我的代码陷入了一堆条件 - 并且是 python 我认为必须有一个更清洁的方法。

This is the "cleanest" method I've found, but it's not super-readable and it doesn't work with the config line:这是我发现的“最干净”的方法,但它不是超级可读的,并且不适用于config行:

# output = `zpool status` output
d = {}

for entry in map(lambda x: x.strip(), output.split('\n')):
    if 'state' in entry:
        pool_state = entry.split(' ')
        key = pool_state[0]
        val = pool_state[1]
        d[key] = val
    if 'status' in entry:
        ...
    if 'config' in entry:
        # entry does not contain output of the config: line

Here is an example.这是一个例子。

s = """  pool: tank
 state: ONLINE
  scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors"""

res = {}
for line in s.splitlines():
    if line == "":  # Ignore everything after the last x: v
        break
    k, v = line.lstrip(" ").split(":", 1)
    if v:
        res[k] = v.lstrip(" ")

Result:结果:

{'pool': 'tank', 'state': 'ONLINE', 'scan': 'resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022'}

I recommend using re.spit and splitting on keys (state, scan) that are at the beginning of the line separated by : and then converting them to dictionary using zip .我建议使用re.spit并在由:分隔的行开头的键(状态、扫描)上进行拆分,然后使用zip将它们转换为字典。

You can also parse config to list of dictionaries.您还可以将配置解析为字典列表。

import re
from pprint import pprint

s = """  pool: tank
 state: ONLINE
  scan: resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 2022
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors"""

def parse_data(data):
    parts = re.split(r'(?:\n|^)\s*(\w*):\s*', data.strip(), re.MULTILINE)[1:]
    parsed = dict(zip(parts[::2], parts[1::2]))
    return {
        **parsed,
        'config': parse_config(parsed.get('config', ''))
    }


def parse_config(data):
    lines = [v.strip().split() for v in data.splitlines() if v.strip()]
    if lines:
        return [
            dict(zip(lines[0], v))
            for v in lines[1:]
        ]
    return []
    

pprint(parse_data(s))

Output should be: Output 应该是:

{'config': [{'CKSUM': '0',
             'NAME': 'tank',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'raidz2-0',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sda',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdc',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdb',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdd',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'},
            {'CKSUM': '0',
             'NAME': 'sdf',
             'READ': '0',
             'STATE': 'ONLINE',
             'WRITE': '0'}],
 'errors': 'No known data errors',
 'pool': 'tank',
 'scan': 'resilvered 35.6G in 00:08:47 with 0 errors on Sat Sep 10 01:20:26 '
         '2022',
 'state': 'ONLINE'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM