简体   繁体   English

通过部分href查找元素(Python Selenium)

[英]Finding an element by partial href (Python Selenium)

I'm trying to access text from elements that have different xpaths but very predictable href schemes across multiple pages in a web database. 我正在尝试从Web数据库中多个页面中具有不同xpath但非常可预测的href方案的元素访问文本。 Here are some examples: 这里有些例子:

 <a href="/mathscinet/search/mscdoc.html?code=65J22,(35R30,47A52,65J20,65R30,90C30)"> 65J22 (35R30 47A52 65J20 65R30 90C30) </a> 

In this example I would want to extract "65J22 (35R30 47A52 65J20 65R30 90C30)" 在这个例子中,我想提取“65J22(35R30 47A52 65J20 65R30 90C30)”

 <a href="/mathscinet/search/mscdoc.html?code=05C80,(05C15)"> 05C80 (05C15) </a> 

In this example I would want to extract "05C80 (05C15)". 在这个例子中,我想提取“05C80(05C15)”。 My web scraper would not be able to search by xpath directly due to the xpaths of my desired elements changing between pages, so I am looking for a more roundabout approach. 由于我想要的元素的xpath在页面之间变化,我的web scraper将无法直接通过xpath进行搜索,因此我正在寻找更加迂回的方法。

My main idea is to use the fact that every href contains "/mathscinet/search/mscdoc.html?code=". 我的主要想法是使用每个href包含“/mathscinet/search/mscdoc.html?code=”的事实。 Selenium can't directly search for hrefs, but I was thinking of doing something similar to this C# implementation : Selenium不能直接搜索hrefs,但我正在考虑做类似于这个C#实现的事情:

Driver.Instance.FindElement(By.XPath("//a[contains(@href, 'long')]"))

To port this over to python, the only analogous method I could think of would be to use the in operator , but I am not sure how the syntax will work when everything is nested in a find_element_by_xpath. 要将其移植到python,我能想到的唯一类似方法是使用in运算符 ,但我不确定当所有内容嵌套在find_element_by_xpath中时语法是如何工作的。 How would I bring all of these ideas together to obtain my desired text? 我如何将所有这些想法结合在一起以获得我想要的文本?

driver.find_element_by_xpath("//a['/mathscinet/search/mscdoc.html?code=' in @href]").text

If I right understand you want to locate all elements, that have same partial href. 如果我理解你想要找到具有相同部分href的所有元素。 You can use this: 你可以用这个:

elements = driver.find_elements_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]")
for element in elements:
    print(element.text)

or if you want to locate one element: 或者如果你想找到一个元素:

driver.find_element_by_xpath("//a[contains(@href, '/mathscinet/search/mscdoc.html')]").text

This will give a list of all elements located. 这将给出所有元素的列表。

As per the HTML you have shared @AndreiSuvorkov's answer would possibly cater to your current requirement. 根据您分享的HTML @ AndreiSuvorkov的答案可能会满足您当前的要求。 Perhaps you can get much more granular and construct an optimized xpath by: 也许您可以通过以下方式获得更多粒度并构建优化的xpath

  • Instead of using contains using starts-with 而不是使用contains使用starts-with
  • Include the ?code= part of the @href attribute 包含?code= @href属性的一部分
  • Your effective code block will be: 您的有效代码块将是:

     all_elements = driver.find_elements_by_xpath("//a[starts-with(@href,'/mathscinet/search/mscdoc.html?code=')]") for elem in all_elements: print(elem.get_attribute("innerHTML")) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM