简体   繁体   English

在美丽的汤中解析html

[英]Parsing html in Beautiful soup

I try to parse the fragments of html like this: 我试图这样解析html的片段:

<div><span>adrress</span>text of address</div>

How can I take fragment 'text of address' programatically without span tag in Beatiful soup? 如何在Beatiful汤中不带span标签的情况下以编程方式获取片段“地址文本”?

Now I take whole content of div and remove span, but I think there are a better way 现在,我将div的全部内容删除并删除span,但是我认为有更好的方法

>>> fragment = '<div><span>adrress</span>text of address</div>'
>>> soup = BeautifulSoup(fragment)
>>> soup.div.span.nextSibling
u'text of address'

here's how: 这是如何做:

>>> from BeautifulSoup import BeautifulSoup
>>> text = "<div><span>address</span>text of address</div>"
>>> print BeautifulSoup(text).find('div').contents[-1]
text of address

Sorry for my first answer, I misread the question. 抱歉,我的第一个答案是我看错了问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM