[英]Trouble converting a string from Unicode in Python 2.7?
I'm extremely confused over unicode in Python 2.x. 我对Python 2.x中的unicode非常困惑。
I'm using BeautifulSoup to scrape a webpage, and I'm trying to insert the things I find into a dictionary with the name as the key, and the url as the value. 我正在使用BeautifulSoup来抓取一个网页,我正在尝试将我找到的东西插入一个字典中,其名称为密钥,url为值。
I'm using BeautifulSoup's find
function to get the info I need. 我正在使用BeautifulSoup的find
函数来获取我需要的信息。 My code started out as follows: 我的代码开头如下:
name = i.find('a').string
url = i.find('a').get('href')
This works, with the exception of the thign returned from find
is an Object, and not a string. 这是有效的,除了从find
返回的thign是一个Object,而不是一个字符串。
Here's were things start confusing me 以下事情让我感到困惑
If I try to convert it to type str
before I assign it to the variable, it sometimes throws an UnicodeEncodeError
. 如果我在将其分配给变量之前尝试将其转换为str
类型,它有时会抛出UnicodeEncodeError
。
'ascii' codec can't encode character u'\xa0' in position 5: ordinal not in range(128)
I Google around and find that I should be encoding to ascii
我谷歌周围发现我应该编码ascii
I try adding: 我尝试添加:
print str(i.find('a').string).encode('ascii', 'ignore')
No luck, still gives an, Unicode Error. 没有运气,仍然给出了一个Unicode错误。
From there, I tried using repr
. 从那里,我尝试使用repr
。
print repr(i.find('a').string)
And that works... almost! 这很有效......差不多!
I ran into a new problem here. 我在这里遇到了一个新问题。
Once everything is said and done, and the dictionary is built, I can't bloody access anything! 一旦完成所有内容,并且构建了字典,我就无法获取任何内容! It keeps giving me a KeyError
. 它一直给我一个KeyError
。
I can loop over the dict: 我可以循环这个词:
for i in sorted(data.iterkeys()):
print i
>>> u'Key1'
>>> u'Key2'
>>> u'Key3'
>>> u'Key4'
but if I try to access an item of the dict like this: 但如果我尝试访问这样的dict项目:
print data['key1']
OR 要么
print data[u'key1']
OR 要么
test = unicode('key1')
print data[test]
They all return KeyErrors, which is 100% confusing to me. 他们都返回KeyErrors,这对我来说是100%的混淆。 I assume it's got something to do with them being Unicode objects. 我认为它与它们是Unicode对象有关。
I've tried just about everything I can come up with, but I can't figure out what's going on. 我已经尝试了我能想到的一切,但我无法弄清楚发生了什么。
Oh! 哦! Adding to the oddity, is that this code: 更奇怪的是,这段代码:
name = repr(i.find('a').string)
print type(name)
returns 回报
>>> type(str)
but if I just print the thing 但如果我只打印那件事
print name
it shows it as a unicode string 它将它显示为unicode字符串
>>>> u'string name'
The .string
value is indeed not a string. .string
值确实不是字符串。 You need to cast it to unicode()
: 你需要将它unicode()
为unicode()
:
name = unicode(i.find('a').string)
It's a unicode- like object called NavigableString
. 它是一个类似于 unigode的对象,名为NavigableString
。 If you really need it to be a str
instead, you can encode it from there: 如果你确实需要它来代替str
,你可以从那里编码:
name = unicode(i.find('a').string).encode('utf8')
or similar. 或类似的。 For use in a dict
I'd use unicode()
objects and not encode. 为了在dict
使用,我使用unicode()
对象而不是编码。
To understand the difference between unicode()
and str()
and what encoding to use, I recommend you read the Python Unicode HOWTO . 要理解unicode()
和str()
之间的区别以及要使用的编码,我建议您阅读Python Unicode HOWTO 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.