[英]retrieving link using beautiful soup
I tried to go inside the link 'original' of this url.我试图进入这个网址的“原始”链接。 It is url + 'AHS_' (for first item)
它是 url + 'AHS_'(第一项)
url = http://pen.jamstec.go.jp/
html = requests.get(url).text
soup = BeautifulSoup(html, 'html5lib')
print (soup)
for item in soup.find_all('a'):
result = item['href']
print (result)
However, it is giving excess information.然而,它提供了过多的信息。
How can I get 'AHS_/' as the result我怎样才能得到“AHS_/”作为结果
Based on the comment, here is script that obtains link from the row with ID "AHS" with text "original" :根据评论,这里是从 ID 为“AHS” 、文本为“original”的行获取链接的脚本:
import requests
from bs4 import BeautifulSoup
url = " http://pen.jamstec.go.jp/"
soup = BeautifulSoup( requests.get(url).text, 'html.parser' )
link = soup.select_one('td:contains("AHS") ~ td:has(a:contains("original")) a')['href']
print(link)
Prints:印刷:
http://pen.jamstec.go.jp/AHS_
EDIT: To parse all links:编辑:要解析所有链接:
for tr in soup.select('table tr:has(a)'):
tds = tr.select('td')
if len(tds) != 4:
continue
print('{:<10} {}'.format(tds[0].text, tr.select_one('a')['href']))
Prints (ID and first "original" link):打印(ID 和第一个“原始”链接):
AHS http://pen.jamstec.go.jp/AHS_
EGT http://pen.jamstec.go.jp/EGT_
FHK http://pen.jamstec.go.jp/FHK_
GDK http://pen.jamstec.go.jp/GDK_
HVT http://pen.jamstec.go.jp/HVT_
KBF http://pen.jamstec.go.jp/KBF_
KEW http://pen.jamstec.go.jp/KEW_
LAM http://pen.jamstec.go.jp/LAM_
LBR http://pen.jamstec.go.jp/LBR_
MMF http://pen.jamstec.go.jp/MMF_
MSE http://pen.jamstec.go.jp/MSE_
MTK http://pen.jamstec.go.jp/MTK_
PFA http://pen.jamstec.go.jp/PFA_
RHN http://pen.jamstec.go.jp/RHN_
SGD http://pen.jamstec.go.jp/SGD_
SHA http://pen.jamstec.go.jp/SHA_
SSP http://pen.jamstec.go.jp/SSP_
TFS http://pen.jamstec.go.jp/TFS_
TGF http://pen.jamstec.go.jp/TGF_
TKC http://pen.jamstec.go.jp/TKC_
TKY http://pen.jamstec.go.jp/TKY_
TOC http://pen.jamstec.go.jp/TOC_
TOE http://pen.jamstec.go.jp/TOE_
TOS http://pen.jamstec.go.jp/TOS_
TSE http://pen.jamstec.go.jp/TSE_
UAK http://pen.jamstec.go.jp/UAK_
URY http://pen.jamstec.go.jp/URY_
YGT http://pen.jamstec.go.jp/YGT_
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.