简体   繁体   中英

gdb PrettyPrinter Plugin routine StdStringPrinter crashing when dealing with std::basic_string<wchar_t(,.*)?>$

I was analyzing a crash dump where I realized the Python plugin pretty-printer ("/usr/share/gdb/python/libstdcxx/v6/printers.py") has crashed in the following line

return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
LookupError: unknown encoding: UCS-4

as shown below

#22 0x00002b25639bb01b in Function(PTR *, const ._210::wstring &, const ._210::wstring &, const ._210::wstring &, bool) (
    pPjmDefn=0x2aaab7409e70, pszRepositoryName=
    Traceback (most recent call last):
  File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 469, in to_string
    return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
LookupError: unknown encoding: UCS-4

I started analyzing the Code

class StdStringPrinter:
    "Print a std::basic_string of some kind"

    def __init__(self, encoding, val):
        self.encoding = encoding
        self.val = val

    def to_string(self):
        # Look up the target encoding as late as possible.
        encoding = self.encoding
        if encoding == 0:
            encoding = gdb.parameter('target-charset')
        elif encoding == 1:
            encoding = gdb.parameter('target-wide-charset')

        # Make sure &string works, too.
        type = self.val.type
        if type.code == gdb.TYPE_CODE_REF:
            type = type.target ()

        # Calculate the length of the string so that to_string returns
        # the string according to length, not according to first null
        # encountered.
        ptr = self.val ['_M_dataplus']['_M_p']
        realtype = type.unqualified ().strip_typedefs ()
        reptype = gdb.lookup_type (str (realtype) + '::_Rep').pointer ()
        header = ptr.cast(reptype) - 1
        len = header.dereference ()['_M_length']
        return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)

and realized there is a call to gdb.parameter with parameters ['gdb.parameter', 'gdb.parameter'] which returns

(gdb) python print gdb.parameter('target-wide-charset')
UCS-4
(gdb) python print gdb.parameter('target-charset')
ANSI_X3.4-1968

The encoding is passed to self.val['_M_dataplus']['_M_p'].string (encoding, length = len) and my best guess is, it calls str.encode or unicode.encode , but none of them seems to support UCS-4 .

>>> u'data'.encode('UCS-4')

Traceback (most recent call last):
  File "<pyshell#529>", line 1, in <module>
    u'data'.encode('UCS-4')
LookupError: unknown encoding: UCS-4

I strongly feel this is a Bug, any clue or Idea?

It depends on how your Python was built. You can do this from gdb to find out:

python import sys
python print sys.maxunicode

I haven't seen this one before; I would guess most distros build with UCS-4 support.

It's also worth considering what wchar_t is on your system. Perhaps UCS-4 is wrong there too. You can use "set target-wide-charset" to change this in gdb. IIRC it's not normally possible for gdb to guess the correct value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM