解析：如何删除Unicode字符？

Question

我写了一些代码来捕获此网页http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&ID=10478上的break元素之间的文本

我认为自己走在正确的轨道上，但现在我得到了一些不好的值以下是我的结果[u'2133 Craigs Store Road'，u'Afton，\\ r \\ n \\ t \\ tVA \\ xa0 \\ r \\ n \\ t \\ t22920'，u'联系人：'，u'电子邮件地址：'，u'网站：'，u'电话：434-882-3150'，u'']

我需要弄清楚如何从结果值中删除unicode。 有人可以帮忙吗？

r=requests.get('http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&ID=10478')
soup=BeautifulSoup(r.content,'lxml')
tbl=soup.findAll('table')[2]

Contact=tbl.findAll('p')[0]

list=[]
for br in Contact.findAll('br'):
    next = br.nextSibling
    text=next.strip()
    list.append(text)
print list

Answer 1

from bs4 import BeautifulSoup, NavigableString, Tag
import requests
import re

r=requests.get('http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&ID=10478')
soup=BeautifulSoup(r.content,'lxml')
tbl=soup.findAll('table')[2]

Contact=tbl.findAll('p')[0]

list=[]
for br in Contact.findAll('br'):
    next = br.nextSibling
    regex = re.compile(r'[\n\r\t\xa0]')
    text=next.strip()
    text=regex.sub(' ', next)
    list.append(text)
print list

我仔细研究了一下，发现可以使用正则表达式得出这些值。我仍然遇到间距问题[u'2133 Craigs Store Road'，u'Afton，VA 22920'，u'Contact Person：' ，u'电子邮件地址：'，u'网站：'，u'电话：434-882-3150'，u'']但是至少字符不见了

Answer 2

您可以使用str类型具有的替换内置函数。

text = next.strip().replace("\n", "").replace("\t", "").replace("\r", "")

这样一来，您可以替换\\n\\t\\r ，而一无所获

解析：如何删除Unicode字符？

问题描述

2 个解决方案

解决方案1
0 2015-07-22 16:52:56

解决方案2
0 已采纳 2015-07-22 17:01:19

解析：如何删除Unicode字符？

问题描述

2 个解决方案

解决方案1 0 2015-07-22 16:52:56

解决方案2 0 已采纳 2015-07-22 17:01:19

解决方案1
0 2015-07-22 16:52:56

解决方案2
0 已采纳 2015-07-22 17:01:19