简体   繁体   English

Python解析复杂的命令输出

[英]Python parse complex command output

Need to parse output of a command in python. 需要在python中解析命令的输出。 The command returns something like this 该命令返回类似这样的内容

A:
        2 bs found
        3 cs found
B:
        1 a found
        3 bs found
C:
        1 c found
        D:
                2 es found
                3 fs found

Need to able to do the following with the output: 需要能够对输出执行以下操作:

access a.bs found ba found. 访问a.bs找到ba找到。 cdes found and so on. 找到cdes等等。

How do I do this python? 我该怎么做这个python? What data structure is best suited to do this? 什么数据结构最适合这样做?

The goal of this exercise is to run the command every 10 secs and identify a diff of what's changed 这个练习的目标是每10秒运行一次命令,并确定变化的差异

An alternative solution is to translate the input string directly into something that a pre-existing library can read. 另一种解决方案是将输入字符串直接转换为预先存在的库可以读取的内容。 This particular data looks like a good fit for YAML. 这个特殊数据看起来非常适合YAML。

In this case you would re.sub('( +)([1-9]+) ([az]).+', '\\\\1\\\\3 : \\\\2', allcontent) , which rewrites the '2 cs found' type lines into a key:value mapping that pyYAML understands. 在这种情况下你会re.sub('( +)([1-9]+) ([az]).+', '\\\\1\\\\3 : \\\\2', allcontent) ,它会重写'' 2 cs发现'键入一行键:pyYAML理解的值映射。 To be precise, the form '2 cs found' becomes 'c : 2' 确切地说,'2 cs found'变为'c:2'

the result? 结果?

A:
        b : 2
        c : 3
B:
        a : 1
        b : 3
C:
        c : 1
        D:
                e : 2
                f : 3

executing yaml.load(newcontent) returns the following python data structure: 执行yaml.load(newcontent)返回以下python数据结构:

{'A': {'b': 2, 'c': 3},
 'B': {'a': 1, 'b': 3},
 'C': {'D': {'e': 2, 'f': 3}, 'c': 1}}

Which matches my suggestion in my earlier comment. 这符合我之前评论中的建议。 If you prefer json (Python comes with a json module), it's pretty simple to adapt this strategy to produce JSON instead. 如果您更喜欢json(Python附带一个json模块),那么调整这个策略来生成JSON非常简单。

This should have a 'parsing' tag as it's a general parsing problem. 这应该有一个'解析'标签,因为它是一般的解析问题。

The normal solution in this kind of situation is to track a) the indentation and b) the list of structures that are currently being parsed, as you read in lines. 在这种情况下的正常解决方案是跟踪a)缩进和b)当前正在解析的结构列表,如同您在行中读取的那样。 b would begin as a list containing a single empty dict, ie. b将作为包含单个空字典的列表开始,即。 curparsing = [{}]

Loop over all input lines. 循环遍历所有输入行。 For example: 例如:

with open('inputfilename','r') as f:
    for line in f:
        # code implementing the below rules.
  • if a line is blank ( if not line.strip(): ), ignore it and go onto the next one ( continue ) 如果一行为空( if not line.strip(): ,则忽略它并转到下一行( continue

  • if the indentation level has decreased, we should remove the top item in the currently-parsing list (ie. curparsing.pop() ). 如果缩进级别已降低,我们应该删除当前解析列表中的顶级项目(即curparsing.pop() )。 if multiple decreases are detected, we should remove multiple items from the top. 如果检测到多个减少,我们应该从顶部删除多个项目。

  • strip off any leading indentation with line=line.lstrip() 使用line=line.lstrip()去除任何前导缩进

  • if ':' is in the line, then we've found a sub-dictionary. 如果':'在行中,那么我们找到了一个子词典。 Read the key(the part to the left of ':'), increase the indent-level, create a new dictionary, and insert it into the dictionary at the current top of the list. 读取键(':'左侧的部分),增加缩进级别,创建一个新的字典,并将其插入列表当前顶部的字典中。 Then append our newly-created dictionary to the list. 然后将我们新创建的字典附加到列表中。

  • if line[0] in '123456789': then we found a report of '[count] [character]s found'. if line[0] in '123456789':那么我们找到了'[count] [character] s found'的报告。 we can use regular expressions to find the count and the character, with m = re.match('([1-9]+) ([az])'); count, character = m.groups(); count = int(count) 我们可以使用正则表达式来查找计数和字符, m = re.match('([1-9]+) ([az])'); count, character = m.groups(); count = int(count) m = re.match('([1-9]+) ([az])'); count, character = m.groups(); count = int(count) m = re.match('([1-9]+) ([az])'); count, character = m.groups(); count = int(count) . m = re.match('([1-9]+) ([az])'); count, character = m.groups(); count = int(count) We then store this into the dictionary at the current top of the list: curparsing[-1][character] = count 然后我们将它存储到列表当前顶部的字典中: curparsing[-1][character] = count

That's pretty much it. 这就是它。 You just loop over lines and apply these rules to each line, and at the end, curparsing[0] contains the parsed document. 您只需遍历行并将这些规则应用于每一行,最后, curparsing[0]包含已解析的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM