简体   繁体   English

使用美丽的汤检索链接

[英]retrieving link using beautiful soup

I tried to go inside the link 'original' of this url.我试图进入这个网址的“原始”链接。 It is url + 'AHS_' (for first item)它是 url + 'AHS_'(第一项)

url = http://pen.jamstec.go.jp/


html = requests.get(url).text
soup = BeautifulSoup(html, 'html5lib')
print (soup)

for item in soup.find_all('a'):
  result = item['href']
  print (result)

However, it is giving excess information.然而,它提供了过多的信息。

How can I get 'AHS_/' as the result我怎样才能得到“AHS_/”作为结果

Based on the comment, here is script that obtains link from the row with ID "AHS" with text "original" :根据评论,这里是从 ID 为“AHS” 、文本为“original”的行获取链接的脚本:

import requests
from bs4 import BeautifulSoup

url = " http://pen.jamstec.go.jp/"

soup = BeautifulSoup( requests.get(url).text, 'html.parser' )

link = soup.select_one('td:contains("AHS") ~ td:has(a:contains("original")) a')['href']
print(link)

Prints:印刷:

http://pen.jamstec.go.jp/AHS_

EDIT: To parse all links:编辑:要解析所有链接:

for tr in soup.select('table tr:has(a)'):
    tds = tr.select('td')
    if len(tds) != 4:
        continue
    print('{:<10} {}'.format(tds[0].text, tr.select_one('a')['href']))

Prints (ID and first "original" link):打印(ID 和第一个“原始”链接):

AHS        http://pen.jamstec.go.jp/AHS_
EGT        http://pen.jamstec.go.jp/EGT_
FHK        http://pen.jamstec.go.jp/FHK_
GDK        http://pen.jamstec.go.jp/GDK_
HVT        http://pen.jamstec.go.jp/HVT_
KBF        http://pen.jamstec.go.jp/KBF_
KEW        http://pen.jamstec.go.jp/KEW_
LAM        http://pen.jamstec.go.jp/LAM_
LBR        http://pen.jamstec.go.jp/LBR_
MMF        http://pen.jamstec.go.jp/MMF_
MSE        http://pen.jamstec.go.jp/MSE_
MTK        http://pen.jamstec.go.jp/MTK_
PFA        http://pen.jamstec.go.jp/PFA_
RHN        http://pen.jamstec.go.jp/RHN_
SGD        http://pen.jamstec.go.jp/SGD_
SHA        http://pen.jamstec.go.jp/SHA_
SSP        http://pen.jamstec.go.jp/SSP_
TFS        http://pen.jamstec.go.jp/TFS_
TGF        http://pen.jamstec.go.jp/TGF_
TKC        http://pen.jamstec.go.jp/TKC_
TKY        http://pen.jamstec.go.jp/TKY_
TOC        http://pen.jamstec.go.jp/TOC_
TOE        http://pen.jamstec.go.jp/TOE_
TOS        http://pen.jamstec.go.jp/TOS_
TSE        http://pen.jamstec.go.jp/TSE_
UAK        http://pen.jamstec.go.jp/UAK_
URY        http://pen.jamstec.go.jp/URY_
YGT        http://pen.jamstec.go.jp/YGT_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM