简体   繁体   English

从 BeautifulSoup 中的锚标记中提取文本

[英]Extract text from anchor tag in BeautifulSoup

I'm trying to extract the titles from a URL but it doesn't have a class.我正在尝试从 URL 中提取标题,但它没有类。 The following code is taken from the page source.以下代码取自页面源代码。

<a href="/f/oDhilr3O">Unatama Don</a>

The title actually does have a class but you can see that I have use index 3 as the first 3 titles aren't what I want.标题实际上确实有一个类,但您可以看到我使用了索引 3,因为前 3 个标题不是我想要的。 However, I don't want to use hard coding.但是,我不想使用硬编码。 But in the website the title is also a link, hence, the link above.但在网站中,标题也是一个链接,因此是上面的链接。

title_name=soup.find_all('div',class_='food-description-title')
title_list=[]

for i in range (3,len(title_name)):
    title=title_name[i].text
    title_list.append(title)

"Unatama Don" is the title I'm trying to get. "Unatama Don"是我想要获得的称号。

Here's an example of searching for an anchor element with a specific URL in BS:以下是在 BS 中搜索具有特定 URL 的锚元素的示例:

from bs4 import BeautifulSoup

document = '''
  <a href="https://www.google.com">google</a>
  <a href="/f/oDhilr3O">Unatama Don</a>
  <a href="test">Don</a>
'''

soup = BeautifulSoup(document, "lxml")
url = "/f/oDhilr3O"

for x in soup.find_all("a", {"href" : url}):
    print(x.text)

Output:输出:

Unatama Don

The requests and bs4 modules are very helpful for tasks like this. requests 和 bs4 模块对于此类任务非常有帮助。 Have you tried something like below?你有没有试过像下面这样的?

import requests
from bs4 import BeautifulSoup

url = ('PASTE/YOUR/URL/HERE')
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('a', href=True)

for each in links:
    print(each.text)

I think this has the desired outcome you are looking for.我认为这具有您正在寻找的理想结果。 If you would like the hyperlinks as well.如果您也想要超链接。 Add another loop and add "print(each.get('href'))" within the loop.添加另一个循环并在循环中添加“print(each.get('href'))”。 Let us know how it goes.让我们知道怎么回事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM