Parsing command line output and storing result in a dictionary using Python

Question

I am using subprocess to execute a command and then attempting to parse its output. The output is of type :

obj 6
endobj 6
Page 12
...
...

This output will be generated across a bunch of files.

The result should be something like :

[obj; 6, 8,3,....for all files]
[endobj; 6,4,5,.....for all files]
...
...

I managed to create the following program:

import subprocess
import os
import re
from collections import defaultdict

def run_pdfid(filename, d):
    try:
        p = subprocess.Popen(['python',
                            '/Users/as/Desktop/tools/pdfid_v0_2_1/pdfid.py',filename],stdout=subprocess.PIPE)

        for line in p.stdout:
            if '%PDF' in line or line.startswith('PDFiD'):
                continue
            pattern1 = "^\s*(\S+)\s+(\d+)"
            m = re.search(pattern1, line)
            key = m.group(1)
            if key in d:
                d[key].append(m.group(2))
            else:
                d[key] = m.group(2)
    except Exception:
        match = None



if __name__ == '__main__':
    os.chdir('/Users/as/Desktop/shared/clean')
    d = dict()
    for root, dirs, file_names in os.walk(os.getcwd()):
        for file in file_names:
            #print file
            run_pdfid(file, d)

    for key, value in d.iteritems():
        print (key, value)

Everything seems to be working fine except the dictionary creation. Can you please help me spot the issue.

Edit: As suggested, I moved the dictionary creation out of the loop and it seems to help me partially. The current output I am getting is only recording one value per key. I was hoping that it will contain the value for all the files. Current output looks like:

('obj', '8')
('/JS', '2')
('stream', '1')
('endobj', '8')

It should have been: ('obj', '8', '6','5',.....)
...
...

Answer 1

You keep recreating the dictionary, but then you only use the final dictionary. You should either indent the last two lines more, or move the dictionary creation out of the loop, depending on whether you want everything in the same dictionary, or want to report once on all the files.

If, as you say, you want a single report for all the files, you need to move the dictionary creation ( d = dict() ) up out before the loop.

Edited to add:

Re your comment, you're probably adding the key once, and then hitting the exception when you try to append to it. You could change d[key].append(m.group(2)) to d[key].append([m.group(2)]) , but really the entire purpose of the defaultdict is to not to have to have that if/else logic, so I would simply replace:

if key in d:
   d[key].append(m.group(2))
else:
   d[key] = m.group(2)

with:

d[key].append(m.group(2))

With a defaultdict, there is no reason for the check for if the key already exists.

Parsing command line output and storing result in a dictionary using Python

Question

1 answers

solution1
0 2017-04-19 04:34:22

Parsing command line output and storing result in a dictionary using Python

Question

1 answers

solution1 0 2017-04-19 04:34:22

solution1
0 2017-04-19 04:34:22