简体   繁体   English

在Python中删除除某个html标签及其内容以外的所有内容

[英]Remove everything besides a certain html tag and its content in Python

I've search around the internet and I cannot find anything that will exclude everything besides a certain tag and its content inside it. 我已经在Internet上搜索了,但找不到除某些标记及其内部内容以外的所有东西。

How can I do this with Python (beautifulsoup 4)? 我该如何使用Python(beautifulsoup 4)?

I have this html: 我有这个HTML:

 <p><iframe width="1000" height="500" allowfullscreen="allowfullscreen" class="embed" src="#"> </iframe></p> <p>sdkjasdkljasldjad;j dadas dasdadada</p> 

I need to remove all other so the output is like this: 我需要删除所有其他内容,以便输出如下所示:

 <iframe width="1000" height="500" allowfullscreen="allowfullscreen" class="embed" src="#"> </iframe> 

I've come up with this but it don't know how to go further: 我想出了这个,但是它不知道如何进一步:

@register.filter(name='only_iframe')
def only_iframe(content):
    soup = BeautifulSoup(content)

    for tag in soup.find_all('p', 'strong'):
        tag.replaceWith('')

    return soup.get_text()

Why don't locate the iframe and get its string representation : 为什么不定位iframe并获取其字符串表示形式

iframe = soup.find("iframe", class_="embed")
print(str(iframe))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM