简体   繁体   中英

Python 2.6 and unicode

So I am working for a web browser type of application for my client and I just implemented bookmarking functionality, but it doesn't work as expected. When user click "Bookmark page" a little form pops up, which takes title of a webpage and puts it in a line edit. The thing is, that if the website has some foreign or unusual symbols in it's title then Python throws an error how it can't encode the string. How could I get python to handle all possible strings, no matter if it has hieroglyphs or some other weird symbols?

Library used for GUI and embedded browser: PyQT

If you're using QWebView.title to get the title of the current web-page, then it will either return a QString or a python unicode string. Which one you get depends on the PyQt API version in use. For version 1 (which is the default for Python2), it will be a QString ; for version 2 (which is the default for Python3), it will be a python unicode string. Whichever it is, in order to display it correctly in the line-edit, just set it directly:

lineEdit.setText(webview.title())

Since you appear to be using Python2, I'll assume that webview.title() is returning a QString . If you want to convert this to a python unicode string (eg in order to use it with sqlite), then you can do the following:

title = unicode(webview.title())

Note that you should not pass an encoding (such as "utf-8") as the second argument to unicode , as this is used for decoding byte strings to unicode strings.

If you do need to get a "utf-8" encoded byte string from a QString , then you can either do:

data = unicode(webview.title()).encode('utf-8')

or:

data = webview.title().toUtf8().data()

What are you using to parse the websites? I would recommend Beautiful Soup . It will try and determine the encoding of the web page and give you back unicode. Beautiful Soup's Parsing HTML section . Edit: Also take a look at the "Beautiful Soup Gives You Unicode, Dammit" section

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM