Beautifulsoup - 删除 HTML 标签

Question

I am trying to strip away all the HTML tags from the 'profile' soup, whoever am I unable to perform the “.text.strip()” operation as it is a list, as shown in code below我试图从“配置文件”汤中去除所有 HTML 标签，无论我是谁都无法执行“.text.strip()”操作，因为它是一个列表，如下面的代码所示

import requests 
from bs4 import BeautifulSoup
from pprint import pprint 

page = requests.get("https://web.archive.org/web/20121007172955/http://www.nga.gov/collection/anZ1.htm").text
soup = BeautifulSoup(company_page, "html.parser")

info = {}
info['Profile'] = soup.select('div.text-desc-members')

pprint(info)

Answer 1

Just iterate through that list:只需遍历该列表：

import requests 
from bs4 import BeautifulSoup
from pprint import pprint 

page = requests.get("https://web.archive.org/web/20121007172955/http://www.nga.gov/collection/anZ1.htm").text
soup = BeautifulSoup(page, "html.parser")

info = {}
info['Profile'] = soup.select('div.text-desc-members')


for item in info['Profile']:
    pprint(item.text.strip())

Beautifulsoup - 删除 HTML 标签

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-12-18 15:05:15

Beautifulsoup - 删除 HTML 标签

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-12-18 15:05:15

解决方案1
3 已采纳 2019-12-18 15:05:15