简体   繁体   English

使用Python解析命令行输出并将结果存储在字典中

[英]Parsing command line output and storing result in a dictionary using Python

I am using subprocess to execute a command and then attempting to parse its output. 我正在使用子进程来执行命令,然后尝试解析其输出。 The output is of type : 输出的类型为:

obj 6 对象6
endobj 6 endobj 6
Page 12 第12话
... ...
... ...

This output will be generated across a bunch of files. 此输出将跨一堆文件生成。

The result should be something like : 结果应该是这样的:

[obj; [OBJ; 6, 8,3,....for all files] 6,8,3,....对于所有文件]
[endobj; [endobj; 6,4,5,.....for all files] 6,4,5,.....对于所有文件]
... ...
... ...

I managed to create the following program: 我设法创建了以下程序:

import subprocess
import os
import re
from collections import defaultdict

def run_pdfid(filename, d):
    try:
        p = subprocess.Popen(['python',
                            '/Users/as/Desktop/tools/pdfid_v0_2_1/pdfid.py',filename],stdout=subprocess.PIPE)

        for line in p.stdout:
            if '%PDF' in line or line.startswith('PDFiD'):
                continue
            pattern1 = "^\s*(\S+)\s+(\d+)"
            m = re.search(pattern1, line)
            key = m.group(1)
            if key in d:
                d[key].append(m.group(2))
            else:
                d[key] = m.group(2)
    except Exception:
        match = None



if __name__ == '__main__':
    os.chdir('/Users/as/Desktop/shared/clean')
    d = dict()
    for root, dirs, file_names in os.walk(os.getcwd()):
        for file in file_names:
            #print file
            run_pdfid(file, d)

    for key, value in d.iteritems():
        print (key, value)

Everything seems to be working fine except the dictionary creation. 除了创建字典以外,其他所有内容似乎都正常运行。 Can you please help me spot the issue. 您能帮我发现问题吗?

Edit: As suggested, I moved the dictionary creation out of the loop and it seems to help me partially. 编辑:按照建议,我将字典创建移出了循环,这似乎对我有所帮助。 The current output I am getting is only recording one value per key. 我得到的当前输出每个键只记录一个值。 I was hoping that it will contain the value for all the files. 我希望它会包含所有文件的值。 Current output looks like: 当前输出如下:

('obj', '8') ('obj','8')
('/JS', '2') ('/ JS','2')
('stream', '1') (“流”,“ 1”)
('endobj', '8') (“ endobj”,“ 8”)

It should have been: ('obj', '8', '6','5',.....) 应该是:('obj','8','6','5',.....)
... ...
... ...

You keep recreating the dictionary, but then you only use the final dictionary. 您一直在重新创建字典,但是随后您仅使用最终的字典。 You should either indent the last two lines more, or move the dictionary creation out of the loop, depending on whether you want everything in the same dictionary, or want to report once on all the files. 您应该缩进最后两行,或者将字典的创建移出循环,这取决于您是否希望所有字典都在同一字典中,还是要对所有文件报告一次。

If, as you say, you want a single report for all the files, you need to move the dictionary creation ( d = dict() ) up out before the loop. 如您所说,如果要为所有文件提供一个报告,则需要在循环之前将字典的创建( d = dict() )移出。

Edited to add: 编辑添加:

Re your comment, you're probably adding the key once, and then hitting the exception when you try to append to it. 关于您的评论,您可能只添加了一次密钥,然后在尝试追加到密钥时遇到了异常。 You could change d[key].append(m.group(2)) to d[key].append([m.group(2)]) , but really the entire purpose of the defaultdict is to not to have to have that if/else logic, so I would simply replace: 您可以将d[key].append(m.group(2))更改为d[key].append([m.group(2)]) ,但实际上defaultdict的整个目的是不必该if / else逻辑,因此我将简单地替换为:

if key in d:
   d[key].append(m.group(2))
else:
   d[key] = m.group(2)

with: 有:

d[key].append(m.group(2))

With a defaultdict, there is no reason for the check for if the key already exists. 使用defaultdict时,没有理由检查密钥是否已经存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM