convert string representation of unicode to unicode

Question

This piece of python 2.7 code first correctly prints "1", but then throws "ValueError: invalid literal for int() with base 10: ''".

num = '\x001\x00'
print num
print int(num)

I guess the problem is that type(num) == <type 'str'> , so I in fact do not have an unicode string for "1", but an ascii string which contains unicode representation of a string "1". Did I get that right?

Anyway, how do I convert num to a format which int() will recognise?

Answer 1

The \\x00 bytes are the problem here, not unicode vs. string values. You can strip those off:

int(num.strip('\x00'))

int() only accepts strings containing digits, with perhaps a decimal point, sign ( + or - ) and surrounding whitespace. NULL bytes are not whitespace, even if your terminal ignores them when printing.

Answer 2

The code appears to print 1 correctly because your terminal ignores the binary zeros you are printing before and after the 1 .

To convert the string to number correctly, you first need to know the format of the string. For example, if the format is such that the textual representation of the number is surrounded with binary zeros, then you can convert it with the code from Martijn's answer . Otherwise, the struct module is a useful general tool for such conversions.

convert string representation of unicode to unicode

Question

2 answers

solution1
4 ACCPTED 2015-01-11 12:51:24

solution2
1 2015-01-11 12:52:02

convert string representation of unicode to unicode

Question

2 answers

solution1 4 ACCPTED 2015-01-11 12:51:24

solution2 1 2015-01-11 12:52:02

solution1
4 ACCPTED 2015-01-11 12:51:24

solution2
1 2015-01-11 12:52:02