简体   繁体   English

Beautifulsoup - 删除 HTML 标签

[英]Beautifulsoup - Remove HTML tags

I am trying to strip away all the HTML tags from the 'profile' soup, whoever am I unable to perform the “.text.strip()” operation as it is a list, as shown in code below我试图从“配置文件”汤中去除所有 HTML 标签,无论我是谁都无法执行“.text.strip()”操作,因为它是一个列表,如下面的代码所示

import requests 
from bs4 import BeautifulSoup
from pprint import pprint 

page = requests.get("https://web.archive.org/web/20121007172955/http://www.nga.gov/collection/anZ1.htm").text
soup = BeautifulSoup(company_page, "html.parser")

info = {}
info['Profile'] = soup.select('div.text-desc-members')

pprint(info)

Just iterate through that list:只需遍历该列表:

import requests 
from bs4 import BeautifulSoup
from pprint import pprint 

page = requests.get("https://web.archive.org/web/20121007172955/http://www.nga.gov/collection/anZ1.htm").text
soup = BeautifulSoup(page, "html.parser")

info = {}
info['Profile'] = soup.select('div.text-desc-members')


for item in info['Profile']:
    pprint(item.text.strip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM