不当的美丽汤解析

Question

With this code I am getting the following URL from BS parsing: 通过此代码，我将从BS解析中获取以下URL：

result, data = mail.uid('search', None, "(FROM 'tiffany@e.tiffany.com')") # search and return uids instead
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]

html = raw_email
soup = BS(html)

urls=[]
for x in soup.find_all('a', href=True):
    urls.append(x['href'])

print urls

Output 产量

'3D"http://elink.tiffany.com/r/YB7DL5S/32FU1/5A6EIF/QFMQOO/6EN2U/52/h"='

How can i strip the first 4 and last 3 characters? 如何删除前4个和后3个字符？ Is it something I can do in beautiful soup or should I use split()? 我可以用漂亮的汤做些什么还是应该使用split（）？

Answer 1

Just use str.lstrip() and rstrip() . 只需使用str.lstrip()和rstrip() 。 The drawback of this method is, you'll have to exactly now what you want to remove. 这种方法的缺点是，您现在必须完全删除要删除的内容。

Here, stripping all the urls, as you put them into a list: 在这里，将所有URL剥离，然后将它们放入列表中：

urls.append(x['href'].lstrip("'3D\"").rstrip("\"=\'"))

不当的美丽汤解析

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-10-11 03:00:31

不当的美丽汤解析

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-10-11 03:00:31

解决方案1
1 已采纳 2013-10-11 03:00:31