简体   繁体   中英

convert string representation of unicode to unicode

This piece of python 2.7 code first correctly prints "1", but then throws "ValueError: invalid literal for int() with base 10: ''".

num = '\x001\x00'
print num
print int(num)

I guess the problem is that type(num) == <type 'str'> , so I in fact do not have an unicode string for "1", but an ascii string which contains unicode representation of a string "1". Did I get that right?

Anyway, how do I convert num to a format which int() will recognise?

The \\x00 bytes are the problem here, not unicode vs. string values. You can strip those off:

int(num.strip('\x00'))

int() only accepts strings containing digits, with perhaps a decimal point, sign ( + or - ) and surrounding whitespace. NULL bytes are not whitespace, even if your terminal ignores them when printing.

The code appears to print 1 correctly because your terminal ignores the binary zeros you are printing before and after the 1 .

To convert the string to number correctly, you first need to know the format of the string. For example, if the format is such that the textual representation of the number is surrounded with binary zeros, then you can convert it with the code from Martijn's answer . Otherwise, the struct module is a useful general tool for such conversions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM