简体   繁体   English

如何使用 python 从文本中删除 html 标签?

[英]How to remove html tags from text using python?

I am new to using python and I am trying to create a simple script that prints out the word of the day from Urban Dictionary.我是使用 python 的新手,我正在尝试创建一个简单的脚本,从 Urban Dictionary 中打印出当天的单词。

    import requests
    from bs4 import BeautifulSoup

    # requests urban dictionary home page 
    r = requests.get('https://www.urbandictionary.com')

    soup = BeautifulSoup(r.text, 'html.parser')

    # finds the title
    title = soup.find('title').text

    print(title)

    # finds the definition
    definition = soup.find('meta', attrs={'property': 'og:description'})

    print(definition)

I use ".text" for the title to get rid of the html tags and it works, but when I try to use it on the definition all of the text disappears.我使用“.text”作为标题来摆脱 html 标记并且它可以工作,但是当我尝试在定义上使用它时,所有文本都消失了。 So, at the moment definition prints out with the html tags.因此,目前使用 html 标签打印出定义。 What are some other ways besides ".text" to remove the html tags.除了“.text”还有什么其他方法可以删除 html 标签。 When I try to paste the output here part of it doesn't show up so here is a picture of the output .当我尝试在这里粘贴 output 时,它的一部分没有显示出来,所以这里是 output 的图片

This is my first time posting on here so I'm sorry if I didn't format my question correctly but any help would be greatly appreciated.这是我第一次在这里发帖,所以如果我没有正确格式化我的问题,我很抱歉,但任何帮助将不胜感激。

... when I try to use [the text property] on the definition all of the text disappears... ...当我尝试在定义上使用[ text属性]时,所有文本都消失了...

This is because the tag you're targeting looks like this:这是因为您定位的标签如下所示:

<meta content="foo bar baz..." name="Description" property="og:description">

When you try to access the text property on this object in Beautiful Soup, there isn't any text that's a child of the element.当您尝试在 Beautiful Soup 中访问此 object 的text属性时,没有任何文本是该元素的子元素。 Instead, you're looking to extract the "content" attribute, which you can do with the square bracket "array"-style notation:相反,您正在寻找提取“内容”属性,您可以使用方括号“数组”样式表示法:

definition['content']

This feature is documented in the Attributes section of the Beautiful Soup documentation.此功能记录在 Beautiful Soup 文档的属性部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM