简体   繁体   English

在多个位置使用find_elements_by_xpath

[英]Using find_elements_by_xpath with multiple positions

Here is the HTML snippet: 这是HTML代码段:

<section class="node_category" id="kui_3_1515304072474_68">
    <h3 class="">User details</h3>
<ul class="" id="kui_3_1515304072474_67">
<li class="contentnode" id="kui_3_1515304072474_66">
<dl id="kui_3_1515304072474_65">
<dt class="">Country
</dt>
<dd class="" id="kui_3_1515304072474_64">United States
</dd>
</dl></li>
<li class="contentnode">
<dl>
<dt class="">City/town
</dt>
<dd class="">Somewhere
</dd>
</dl></li>
<li class="contentnode" id="kui_3_1515304072474_76">
<dl id="kui_3_1515304072474_75">
<dt class="">Company
</dt>
<dd class="" id="kui_3_1515304072474_74">ABC Inc
</dd>
</dl></li>
</ul></section>

I want to extract text from the following HTML class by XPath: 我想通过XPath从以下HTML类提取文本:

/ul/li[@class='contentnode'][3]/dl/dd

This "contentnode" class has multiple positions from 1 to maximum 6 for other pages. 对于其他页面,此“ contentnode”类具有从1到最大6的多个位置。 In this example, the maximum is 3. To designate all positions, I construct XPath like below: 在此示例中,最大值为3。要指定所有位置,我按如下方式构造XPath:

//li[@class='contentnode'][1 <= position() and position() < 7]/dl/dd

Now, I plug into my Python code like below: 现在,我像下面这样插入我的Python代码:

from selenium import webdriver


lst=[]
browser = webdriver.Chrome('./path')
url = "https://<target URL>"
browser.get(url)
contents = browser.find_elements_by_xpath("//li[@class='contentnode'][1 <= position() and position() < 7]/dl/dd")

for t in contents:

    lst.append([t.text])

print(lst)

However, the output only shows position 1. It should show all the text from the position 1 to 6. 但是,输出仅显示位置1。它应显示位置1到6的所有文本。

[Edit] Also I tried, [编辑]我也尝试过

//li[@class='contentnode'][contains(@id,'kui_3')]/dl/dd

but still does not work. 但仍然无法正常工作。 It does not show any error but the result is nothing. 它没有显示任何错误,但是没有任何结果。

What's wrong with my code? 我的代码有什么问题?

This is working code for your needs: 这是您需要的工作代码:

from selenium import webdriver


lst = []
browser = webdriver.Chrome()
browser.get("https://<target URL>")

contents = browser.find_elements_by_xpath("//li[@class='contentnode'][1 <= position() and position() < 7]/dl/dd")

for t in contents:

    lst.append(t.text)

print(lst)

browser.quit()

The result will be (according to your HTML): 结果将是(根据您的HTML):

['United States', 'Somewhere', 'ABC Inc']

Hope it helps you! 希望对您有帮助!

Try below code 试试下面的代码

from selenium import webdriver

lst=[]
browser = webdriver.Chrome('./path')
url = "https://<target URL>"
browser.get(url)
contents = browser.find_elements_by_xpath("//li[@class='contentnode']/dl/dd")
print len(contents)

for t in contents:
    lst.append(t.text)

print(lst)

Did you try with css selector? 您尝试使用CSS选择器吗? If not then you should give it a go: 如果没有,那么您应该尝试一下:

for items in browser.find_elements_by_css_selector(".contentnode"):
    data = ' '.join([' '.join(item.text.split()) for item in items.find_elements_by_css_selector("dd")])
    print(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM