繁体   English   中英

如何在 HTML 标签中打印第一个元素

[英]how to print 1st element in HTML tag

我的代码从页面的不同“部分”获取链接/HTML。

它每部分打印 2 个链接,但我只希望打印第一个。

预期的 output 不应包含以“视频”结尾的链接,就像我的代码一样。

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
jam=[]
baseurl='https://meetinglibrary.asco.org'
driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page=1')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('a',class_='ng-star-inserted')
for item in productlist:
    for link in item.find_all('a',href=True):
        jam.append(baseurl+link['href'])
print(jam)

使用os.path.basename获取字符串的结尾。并使用in运算符检查"video"是否存在:

from selenium import webdriver
from bs4 import BeautifulSoup
import time
import os

driver = webdriver.Chrome()
jam = []
baseurl = 'https://meetinglibrary.asco.org'
driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page=1')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
productlist = soup.find_all('a', class_='ng-star-inserted')
for item in productlist:
    for link in item.find_all('a', href=True):
        url = link['href']
        if "video" not in os.path.basename(url):
            jam.append(baseurl + url)
print(jam)

结果:

['https://meetinglibrary.asco.org/record/185955/abstract',
 'https://meetinglibrary.asco.org/record/185955/slide',
 'https://meetinglibrary.asco.org/record/185954/abstract',
 'https://meetinglibrary.asco.org/record/186048/abstract',
 'https://meetinglibrary.asco.org/record/186048/slide',
 'https://meetinglibrary.asco.org/record/190197/slide',
 'https://meetinglibrary.asco.org/record/192623/slide',
 'https://meetinglibrary.asco.org/record/185414/abstract',
 'https://meetinglibrary.asco.org/record/185414/slide',
 'https://meetinglibrary.asco.org/record/185415/abstract',
 'https://meetinglibrary.asco.org/record/185415/slide',
 'https://meetinglibrary.asco.org/record/185473/abstract',
 'https://meetinglibrary.asco.org/record/185473/slide',
 'https://meetinglibrary.asco.org/record/187584/slide',
 'https://meetinglibrary.asco.org/record/188561/slide',
 'https://meetinglibrary.asco.org/record/186710/abstract',
 'https://meetinglibrary.asco.org/record/186710/slide',
 'https://meetinglibrary.asco.org/record/186699/abstract',
 'https://meetinglibrary.asco.org/record/186699/slide',
 'https://meetinglibrary.asco.org/record/186698/abstract',
 'https://meetinglibrary.asco.org/record/186698/slide',
 'https://meetinglibrary.asco.org/record/187720/slide',
 'https://meetinglibrary.asco.org/record/187480/abstract',
 'https://meetinglibrary.asco.org/record/187480/slide',
 'https://meetinglibrary.asco.org/record/191961/slide',
 'https://meetinglibrary.asco.org/record/192626/slide',
 'https://meetinglibrary.asco.org/record/186983/abstract',
 'https://meetinglibrary.asco.org/record/186983/slide',
 'https://meetinglibrary.asco.org/record/188580/abstract',
 'https://meetinglibrary.asco.org/record/188580/slide',
 'https://meetinglibrary.asco.org/record/189047/abstract',
 'https://meetinglibrary.asco.org/record/189047/slide',
 'https://meetinglibrary.asco.org/record/190223/slide',
 'https://meetinglibrary.asco.org/record/190273/slide',
 'https://meetinglibrary.asco.org/record/184812/abstract',
 'https://meetinglibrary.asco.org/record/184812/slide',
 'https://meetinglibrary.asco.org/record/184927/slide',
 'https://meetinglibrary.asco.org/record/184805/abstract',
 'https://meetinglibrary.asco.org/record/184805/slide',
 'https://meetinglibrary.asco.org/record/184811/abstract',
 'https://meetinglibrary.asco.org/record/184811/slide',
 'https://meetinglibrary.asco.org/record/185576/slide',
 'https://meetinglibrary.asco.org/record/190147/slide']

您可以在附加脚本之前使用条件 function。

...
for item in productlist:
    ahrefs = item.find_all('a', href=True)
    for index in range(len(ahrefs)):
        if (index % 2 == 0) and ('video' not in ahrefs[index]['href']):
            jam.append(baseurl+ahrefs[index]['href'])
print(jam)
...

尝试后告诉我。 祝你好运

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM