[英]Remove everything besides a certain html tag and its content in Python
I've search around the internet and I cannot find anything that will exclude everything besides a certain tag and its content inside it. 我已经在Internet上搜索了,但找不到除某些标记及其内部内容以外的所有东西。
How can I do this with Python (beautifulsoup 4)? 我该如何使用Python(beautifulsoup 4)?
I have this html: 我有这个HTML:
<p><iframe width="1000" height="500" allowfullscreen="allowfullscreen" class="embed" src="#"> </iframe></p> <p>sdkjasdkljasldjad;j dadas dasdadada</p>
I need to remove all other so the output is like this: 我需要删除所有其他内容,以便输出如下所示:
<iframe width="1000" height="500" allowfullscreen="allowfullscreen" class="embed" src="#"> </iframe>
I've come up with this but it don't know how to go further: 我想出了这个,但是它不知道如何进一步:
@register.filter(name='only_iframe')
def only_iframe(content):
soup = BeautifulSoup(content)
for tag in soup.find_all('p', 'strong'):
tag.replaceWith('')
return soup.get_text()
Why don't locate the iframe
and get its string representation : 为什么不定位
iframe
并获取其字符串表示形式 :
iframe = soup.find("iframe", class_="embed")
print(str(iframe))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.