UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 434852: invalid continuation byte

Question

I am using hfcca to calculate cyclomatic complexity for a c++ code. hfcca is a simple python script ( https://code.google.com/p/headerfile-free-cyclomatic-complexity-analyzer/ ). When i am trying to run the script to generate the output in the form of an xml file i am getting following errors :

Traceback (most recent call last):
    "./hfcca.py", line 802, in <module>
    main(sys.argv[1:])
    File "./hfcca.py", line 798, in main
    print(xml_output([f for f in r], options))
    File "./hfcca.py", line 798, in <listcomp>
    print(xml_output([f for f in r], options))
    File "/x/home06/smanchukonda/PREFIX/lib/python3.3/multiprocessing/pool.py", line 652, in next
    raise value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 434852: invalid continuation byte

Please help me with this..

Answer 1

The problem looks like the file has characters represented with latin1 that aren't characters in utf8. The file utility can be useful for figuring out what encoding a file should be treated as, eg:

monk@monk-VirtualBox:~$ file foo.txt 
foo.txt: UTF-8 Unicode text

Here's what the bytes mean in latin1:

>>> b'\xe2'.decode('latin1')
'â'

Probably easiest is to convert the files to utf8.

Answer 2

I also had the same problem rendering Markup("""yyyyyy""") but i solved it using an online tool with removed the 'bad' characters. https://pteo.paranoiaworks.mobi/diacriticsremover/

It is a nice tool and works even offline.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 434852: invalid continuation byte

Question

2 answers

solution1
11 2013-04-22 13:41:44

solution2
1 2018-04-02 10:24:37

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 434852: invalid continuation byte

Question

2 answers

solution1 11 2013-04-22 13:41:44

solution2 1 2018-04-02 10:24:37

solution1
11 2013-04-22 13:41:44

solution2
1 2018-04-02 10:24:37