I was analyzing a crash dump where I realized the Python plugin pretty-printer
("/usr/share/gdb/python/libstdcxx/v6/printers.py") has crashed in the following line
return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
LookupError: unknown encoding: UCS-4
as shown below
#22 0x00002b25639bb01b in Function(PTR *, const ._210::wstring &, const ._210::wstring &, const ._210::wstring &, bool) (
pPjmDefn=0x2aaab7409e70, pszRepositoryName=
Traceback (most recent call last):
File "/usr/share/gdb/python/libstdcxx/v6/printers.py", line 469, in to_string
return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
LookupError: unknown encoding: UCS-4
I started analyzing the Code
class StdStringPrinter:
"Print a std::basic_string of some kind"
def __init__(self, encoding, val):
self.encoding = encoding
self.val = val
def to_string(self):
# Look up the target encoding as late as possible.
encoding = self.encoding
if encoding == 0:
encoding = gdb.parameter('target-charset')
elif encoding == 1:
encoding = gdb.parameter('target-wide-charset')
# Make sure &string works, too.
type = self.val.type
if type.code == gdb.TYPE_CODE_REF:
type = type.target ()
# Calculate the length of the string so that to_string returns
# the string according to length, not according to first null
# encountered.
ptr = self.val ['_M_dataplus']['_M_p']
realtype = type.unqualified ().strip_typedefs ()
reptype = gdb.lookup_type (str (realtype) + '::_Rep').pointer ()
header = ptr.cast(reptype) - 1
len = header.dereference ()['_M_length']
return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
and realized there is a call to gdb.parameter
with parameters ['gdb.parameter', 'gdb.parameter']
which returns
(gdb) python print gdb.parameter('target-wide-charset')
UCS-4
(gdb) python print gdb.parameter('target-charset')
ANSI_X3.4-1968
The encoding is passed to self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
and my best guess is, it calls str.encode
or unicode.encode
, but none of them seems to support UCS-4
.
>>> u'data'.encode('UCS-4')
Traceback (most recent call last):
File "<pyshell#529>", line 1, in <module>
u'data'.encode('UCS-4')
LookupError: unknown encoding: UCS-4
I strongly feel this is a Bug, any clue or Idea?
It depends on how your Python was built. You can do this from gdb to find out:
python import sys
python print sys.maxunicode
I haven't seen this one before; I would guess most distros build with UCS-4 support.
It's also worth considering what wchar_t is on your system. Perhaps UCS-4 is wrong there too. You can use "set target-wide-charset" to change this in gdb. IIRC it's not normally possible for gdb to guess the correct value.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.