<a>从beautifulsoup结果中</a>删除<a>HTML标记</a>

Question

Using beautifulsoup I'm able to scrape a web page with this code: 使用beautifulsoup我可以使用以下代码抓取网页：

import requests
from bs4 import BeautifulSoup

page = requests.get("http://www.acbbroker.it/soci_dettaglio.php?r=3")
page

soup = BeautifulSoup(page.content, 'html.parser')
test = soup.find(id="paginainterna-content")
test_items = test.find_all(class_="entry-content")
tonight = test_items[0]

names = []
for x in tonight.find_all('a', itemprop="url"):
    names.append(str(x))
print(names)

but I'm not able to clean the results and obtain only the content inside the < a > paragraph (removing also the href). 但我无法清除结果并只获取<a>段内的内容（同时删除href）。

Here is a small snap of my result: 这是我的结果的一小部分：

 '<a href="http://www.google.com/maps/place/45.45249938964844,9.210599899291992" itemprop="url" target="_blank">A&amp;B; Insurance e Reinsurance Brokers Srl</a>', '<a href="http://www.google.com/maps/place/45.647499084472656,8.774800300598145" itemprop="url" target="_blank">A.B.A. BROKERS SRL</a>', '<a href="http://www.google.com/maps/place/45.46730041503906,9.148480415344238" itemprop="url" target="_blank">ABC SRL BROKER E CONSULENTI DI ASSI.NE</a>', '<a href="http://www.google.com/maps/place/45.47710037231445,9.269220352172852" itemprop="url" target="_blank">AEGIS INTERMEDIA SAS</a>',

What is the proper way to handle this kind of data and obtain a clean result? 处理此类数据并获得干净结果的正确方法是什么？

Thank you 谢谢

Answer 1

if you want only text from tag use get_text() method 如果只想使用标签中的文本，请使用get_text()方法

for x in tonight.find_all('a', itemprop="url"):
    names.append(x.get_text())                                                                                                                                                    
print(names)

better with list comprehension this is fastest 更好的list comprehension这是最快的

names = [x.get_text() for x in tonight.find_all('a', itemprop='url')]

Answer 2

我不知道你想要什么输出，但是你通过更改它得到它的文本

names.append(str(x.get_text()))

<a>从beautifulsoup结果中</a>删除<a>HTML标记</a>

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-05-24 17:23:28

解决方案2
1 2018-05-24 17:26:32

<a>从beautifulsoup结果中</a>删除<a>HTML标记</a>

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-05-24 17:23:28

解决方案2 1 2018-05-24 17:26:32

解决方案1
2 已采纳 2018-05-24 17:23:28

解决方案2
1 2018-05-24 17:26:32