简体   繁体   English

使用硒无法从网页中提取文本

[英]Pull text from webpage using selenium not working

I'm trying to pull some text from a webpage. 我正在尝试从网页中提取一些文本。 The page source that I want to pull data from is: 我要从中提取数据的页面源是:

<tbody>
    <tr class="drx_dotted">
        <td class="drx_first">
            <span name="pharmacy"
                  longitude="-82.531457"
                  latitude="42.617612"
                  pharmacyname="CVS Pharmacy #"
                  address="1025 St Clair River Dr"
                  city="Algonac"
                  state="MI"
                  zip="48001"
                  phone="8107944941">
            </span>
            <p>
                <strong>CVS Pharmacy #</strong><br />
                1025 St Clair River Dr<br />
                Algonac, MI 48001<br />
                1-810-794-4941
            </p>
            <p>
                <a class=""
                   data-ajax="true"
                   data-ajax-method="post"
                   data-ajax-success="UpdateSearchPharmacyList"
                   href="/pfdn/SharedPharmacy/AddNetworkPharmacy?pharmacyNABP=2352324&amp;language=English">Add Pharmacy
                    <span class='HiddenText'> CVS Pharmacy #</span>
                </a>
            </p>
        </td>
        <td>
            <p>
                Retail
            </p>
        </td>
        <td>
            <p>
                Not applicable
            </p>
        </td>
    </tr>

I want to pull the "Not applicable" near the bottom of the HTML code. 我想将“不适用”拉到HTML代码底部附近。 It is the "p" in the third "td" in the HTML source code. 它是HTML源代码中第三个“ td”中的“ p”。 There are also a bunch of these, so I want to pull all these tags at once into a python list. 还有很多,所以我想一次将所有这些标签拉入python列表中。

Here is the selenium code I'm using to find the HTML: 这是我用来查找HTML的硒代码:

x = driver.find_elements_by_xpath(
    '//[@id="divSearchResultContainer"]/div[2]/div[2]/table/tbody/tr/td[3]/p')

When I type print(x) it prints out this: 当我键入print(x)时,它会打印出以下内容:

[<selenium.webdriver.remote.webelement.WebElement object at 0x101f98210>,
 <selenium.webdriver.remote.webelement.WebElement object at 0x101f98250>,
 <selenium.webdriver.remote.webelement.WebElement object at 0x101f98290>]

So selenium has found and pull three instances (which is correct, it was supposed to find three). 因此,硒已经找到并提取了三个实例(这是正确的,应该可以找到三个实例)。 However, when I try to pull the text using; 但是,当我尝试使用来拉文本时;

print x[0].text

the output is: 输出为:

None

I've tried a bunch of variations, even trying to find each element individually, but it's still not working. 我尝试了很多变化,甚至尝试单独查找每个元素,但是仍然无法正常工作。 Has anyone had this problem? 有人遇到过这个问题吗? How can I resolve it? 我该如何解决?

Thanks 谢谢

The problem is that you have multiple tr tags, get the appropriate one. 问题是您有多个tr标签,请获取适当的标签。 Use find_element_by_xpath() to find a single element instead of a list and use the following xpath: 使用find_element_by_xpath()查找单个元素而不是列表,并使用以下xpath:

'//[@id="divSearchResultContainer"]/div[2]/div[2]/table/tbody/tr[1]/td[3]/p

The python code: python代码:

element = driver.find_elements_by_xpath(
'//[@id="divSearchResultContainer"]/div[2]/div[2]/table/tbody/tr[1]/td[3]/p')

Note the [1] after the tr . 注意tr之后的[1] This is how we are saying to look at the first tr tag only. 这就是我们所说的只看第一个tr标签。


Also note that the xpath you have looks fragile - this is because of the use of indexing: give me second div in this div, and then second div in that etc. Posting the complete contents of the element with divSearchResultContainer id would help to provide your with a better solution. 还要注意的是, xpath你看起来脆弱-这是因为使用索引的:给我第二次div在这个div,然后第二次div在等发布与元素的完整内容divSearchResultContainer ID将有助于您提供有更好的解决方案。

对xpath尝试一下,我还没有测试过,但是xpath具有last()运算符,这是您想要的。

"//tbody//tr//td[last()]/p[last()]/text()"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM