在python中使用BeautifulSoup获取文本后的'href'标签

Question

每当我搜索具有 href 链接的单词时，我想要得到的是带有相应文本的“href”。 在这个例子中，如果我从下面的“div”中搜索“over”这个词，我需要它显示“over + 'href'”。

Sample of the html i used :
html '''
<div class="ez" style="" data-ft="&#123;&quot;tn&quot;:&quot;*s&quot;&#125;"> 
<span><p>This is the text here</p> <a href=" my link 3 ">More</a>
<div class="bl" style="" data-ft="&#123;&quot;tn&quot;:&quot;*s&quot;&#125;">
<span><p>Hello everybody over there</p><a href="my link 1></div><div 
class="ol"...><div class="bq qr"><a> class "gh" href="my link 2"</a>
'''html

enter code here 
    for text_href in soup.findAll('div'):
        word = text_href.text
        link = text_href['href']
        print(word '+' link)
for list in word:
    pattern =re.compile(r'over', re.I|re.UNICODE)
    matches = pattern.finditer(c)
        for match in matches:
            print(match) + print(link)

因此，我期望的输出是标记出在我的情况下“结束”的匹配以及匹配“结束”所在的链接（href）。 结果：over + '我想获得的链接'（即href）

Answer 1

我想你正在寻找这样的东西：

for text_href in soup.findAll('div'):
    word = text_href.text
    if 'over' in word:
        print(text_href.a['href'])

输出：

 the link i want to obtain

Answer 2

如果链接总是出现在搜索文本之后，您可以使用find_next方法。

像这样的东西——

html_doc ='''
<div class="ez" style="" data-ft="&#123;&quot;tn&quot;:&quot;*s&quot;&#125;"> 
<span><p>This is the text over here</p> <a href="the link i want to obtain 
">More</a>
<div class="bl" style="" data-ft="&#123;&quot;tn&quot;:&quot;*s&quot;&#125;">
<span><p>Hello everybody</p> <a href="www.mylink...">More</a>
'''

from bs4 import BeautifulSoup
import re

soup = BeautifulSoup(html_doc, 'html.parser')

search_string = 'over'

print(search_string, '+', soup.find(string=re.compile(search_string, re.I)).find_next('a')['href']) # over + the link i want to obtain

如果您要查找整个单词，则可以相应地更新正则表达式。

在python中使用BeautifulSoup获取文本后的'href'标签

问题描述

2 个解决方案

解决方案1
2 2020-03-17 13:44:26

解决方案2
1 已采纳 2020-03-17 14:22:50

在python中使用BeautifulSoup获取文本后的&#39;href&#39;标签

问题描述

2 个解决方案

解决方案1 2 2020-03-17 13:44:26

解决方案2 1 已采纳 2020-03-17 14:22:50

在python中使用BeautifulSoup获取文本后的'href'标签

解决方案1
2 2020-03-17 13:44:26

解决方案2
1 已采纳 2020-03-17 14:22:50