简体   繁体   中英

Why do printed characters disappear in this python code?

(This question rised from an attempt to get around this problem )

I'm trying to print a list of dictionaries in python. Since I can't find a real function which is able to convert a python object to a string (no, json.dumps doesn't work), I thought to write a simple printing script.

Unfortunately characters at the beginning of the line simply disappear... Now, I'm probably no expert in python, but this behavior looks nonsense to me.

# The out object is returned by a library (rekall) 
# and it is a list of dictionaries.
import rekall
out = rekall.a_modified_module.calculate()

print '[',
for ps in out:
    first = True
    print '{',
    for info in ps:
        if first:
            first = False
        else:
            print '\'%s\':\'%s\',' % (info, ps[info]),
    print '}',
print ']'

I would expect the output to be:

[{'pid':'2040', 'name':'leon.exe', 'offset':'2234185984',}]

Instead I get this:

'pid':'2040', 'name':'leon.exe', 'offset':'2234185984',}]

Can you please explain me what's happening here? (I'm skipping first line in the loop because it contains another dictionary and the output gets even crazier, with mixed parts of the output)

PS: if you have a valid option for printing a generic python object (something comparable to JSON.stringify in javascript, but without having to deal with JSON objects) please tell me.

EDIT: My question aims at explaining this strange (to me) behavior, where the output depends on what is printed after the brackets. In fact, if I remove the inner for loop ("for info in ps"), the initial brackets are printed correctly. Also, if I create a pipe to send the output to another program, that program will receive the output correctly, starting from the brackets.

EDIT: To help understanding the nature of the problem, and the type of the 'out' object, this is the output using the 'pprint' module:

[{'name':  [String:ImageFileName]: 'leon.exe\x00',
  'offset': 2236079360,
  'pid':  [unsigned int:UniqueProcessId]: 0x000007FC,
  'psscan': {'CSRSS': False,
             'Handles': False,
             'PsActiveProcessHead': True,
             'PspCidTable': True,
             'Sessions': True}}]

Python objects have two methods used to get a quick human-readable representation of its data: str which gives a nicely printable representation of an object and repr which attempts to give a string that could be used to rebuild the object: For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval() . Heavy emphassis on "attempts". Classes are free to override the default implementation with their own __str__ and __repr__ methods.

Your example output:

'name':  [String:ImageFileName]: 'leon.exe\x00'

is interesting. It shows that the rekall module is overriding __repr__ to give a more complex view of its data types ( [String:ImageFileName]: ). But that's not valid python - the implementors were just giving a more typeful description. It also shows that its strings 'leon.exe\\x00' have non-printable characters in them. It means that, in this instance, a NUL \\x00 is emitted when printing the string value of the data. I would call this a bug - but it could be that the module is supposed to emit raw binary data.

Non-printable characters may be used for formatting by your console. For instance, \\r (carriage return) tells the console to reposition at the start of the line and overwrite characters

>>> print 'foo\rbar'
bar

On my console, this escape sequence

>>> print '\x1b[0;31;40m hello'
hello

makes "hello" print in red.

If rekall is putting out raw binary data, strings you are trying to print have non-printable characters that mess up your console display. To keep things complicated, the rekall module may be checking whether its stdout is a terminal and changing its output to add fancy terminal-oriented formatting to its strings.

Assuming that rekall is putting raw binary data in strings you could do str to get rid of rekall metadata and then repr to escape the troublesome characters

def mystr(s):
    return repr(str(s))

for ps in out:
    first = True
    for info in ps:
        if first:
            first = False
        else:
            print '\'%s\':\'%s\'' % (mystr(info), mystr(ps[info])))

Or write your own function to filter out characters you don't want. This is kinda difficult in Unicode but for ascii text we could take a subset of the characters you would find in string.printable .

printable = set(
    '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$'
    '%&\\\'()*+,-./:;<=>?@[\\]^_`{|}~ \t')

def mystr(s):
    return ''.join(filter(printable.__contains__, str(s)))

for ps in out:
    first = True
    for info in ps:
        if first:
            first = False
        else:
            print '\'%s\':\'%s\'' % (mystr(info), mystr(ps[info])))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM