![](/img/trans.png)
[英]How to get text and href value in anchor tag with scrapy, xpath, python
[英]Python Selenium How to get anchor tag href value only if anchor tag contains certain attribute value
我想从GitHub 搜索结果中获取 GitHub 存储库链接。 现在,我的代码获得了用户名和存储库的链接。 如何通过定位锚标记属性值仅获取存储库链接。
我的代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
path = "C:\programs\chromedriver.exe"
driver = webdriver.Chrome(path)
url = 'https://github.com/topics/flutter-apps'
driver.get(url)
links_list = []
headings = driver.find_elements_by_class_name('f3')
for heading in headings:
links = heading.find_elements_by_tag_name('a')
for l in links:
links_list.append(l.get_attribute('href'),)
print(links_list)
这是我想从中获取链接的代码。
<h1 class="f3 text-gray text-normal lh-condensed">
<a data-hydro-click="{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":49521558,"record_id":484656,"originating_url":"https://github.com/topics/ios","user_id":49521558}}"
data-hydro-click-hmac="7b69680b468dda1b4e10ddab19c8034fd4c530bc57957662d8be320d79cc38f1"
data-ga-click="Explore, go to repository owner, location:explore feed" href="/vsouza">
vsouza
</a> /
<a data-hydro-click="{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":49521558,"record_id":21700699,"originating_url":"https://github.com/topics/ios","user_id":49521558}}"
data-hydro-click-hmac="c38ef14c5a72214b8e946bde857c36653301cb96a15a6b1108242526485221b8"
data-ga-click="Explore, go to repository, location:explore feed" href="/vsouza/awesome-ios" class="text-bold">
awesome-ios
</a>
</h1>
在两个锚元素之间,我想获取具有此属性和值的锚标签的 href 值data-ga-click="Explore, go to repository, location:explore feed"
要获得这样的特定链接,您必须在xpath
中传递此data-ga-click
属性以获得独特的结果。
for heading in headings:
links = heading.find_elements_by_xpath('.//a[@data-ga-click="Explore, go to repository, location:explore feed"]')
for l in links:
links_list.append(l.get_attribute('href'))
或 Css 选择器。
for heading in headings:
links = heading.find_elements_by_css_selector('a[data-ga-click="Explore, go to repository, location:explore feed"]')
for l in links:
links_list.append(l.get_attribute('href'))
您是否只想要标题内具有该值的 a 标签。 你需要使用. 对于子元素并使用数据属性值。
heading.find_elements_by_xpath('.//a[@data-ga-click="Explore, go to repository owner, location:explore feed"]')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.