简体繁体中英

Python decoding of back quotations

原文 2014-10-10 00:02:10 5 1 python/ database/ codec

I am receiving this issue
" UnicodeEncodeError: 'latin-1' codec can't encode character u'\”' "

I'm quite new to working with databases as a whole. Previously, I had been using SQLite3; however, now transitioning/migrating to MySQL, I noticed u'\”' and u'\“' characters were within some of my text data.

I'm currently making a python script to tackle the migration; however, I'm getting stuck with this codec issue that I previously didn't for see.

So my question is, how do I replace/decode these values so that I can actually store them in MySQL DB?

1 answers

You don't have a problem decoding these characters; wherever they're coming from, if they're showing up as \” ( ” ) and \“ ( “ ), they're already being properly decoded.

The problem is encoding these characters. If you want to store your strings in Latin-1 columns, they can only contain the 256 characters that exist in Latin-1, and these two are not among them.

So my question is, how do I replace/decode these values so that I can actually store them in MySQL DB?

The obvious solution is to use UTF-8 columns instead of Latin-1 in MySQL. Then this problem wouldn't even exist; any Unicode string can be encoded as UTF-8.

But assuming you can't do that for some reason…

Python comes with built-in support for different error handlers that can help you do something with these characters while encoding them. You just have to decide what "something" that is.

Let's say your string looks like hey “hey” hey . Here's what each error handler would do with it:

s.encode('latin-1', 'ignore') : hey hey hey
s.encode('latin-1', 'replace') : hey ?hey? hey hey ?hey? hey
s.encode('latin-1', 'xmlcharrefreplace'): hey “hey” hey`
s.encode('latin-1', 'backslashreplace'): hey \“hey\” hey`

The first two have the advantage of being somewhat readable, but the disadvantage that you can never recover the original string. If you want that, but want something even more readable, you may want to consider a third-party library like unidecode :

unidecode('hey “hey” hey').encode('latin-1'): hey "hey" hey`

The last two are lossless, but kind of ugly. Although in some contexts they'll look pretty nice—eg, if you're building an XML document, xmlcharrefreplace (maybe even with 'ascii' instead of 'latin-1' ) will give you exactly what you want in an XML viewer. There are special-purpose translators for various other use cases (like HTML references, or XML named entities instead of numbered, etc.) if you know what you want.

But in general, you have to make the choice between throwing away information, or "hiding" it in some ugly but recoverable form.

Python Decoding binary data back to file

Python 2.7.6 String Literal error with Quotations in Quotations

Python — Quotations around filenames

quotations in input python

Printing in Python shell with " quotations instead of '

python 3.5 syntaxerror WITH parenthesis and quotations

Add to python list without quotations

Striping or Removing quotations within a string in Python

Replace double quotations with brackets in a Python List of Strings

How to remove quotations from a dictionary in python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Python Decoding binary data back to file Python 2.7.6 String Literal error with Quotations in Quotations Python — Quotations around filenames quotations in input python Printing in Python shell with " quotations instead of ' python 3.5 syntaxerror WITH parenthesis and quotations Add to python list without quotations Striping or Removing quotations within a string in Python Replace double quotations with brackets in a Python List of Strings How to remove quotations from a dictionary in python

Related Tags

Python decoding of back quotations

Question

1 answers

solution1 0 ACCPTED 2014-10-10 00:20:55

solution1
0 ACCPTED 2014-10-10 00:20:55