简体   繁体   English

使用xpath或cssSelector解析HTML?

[英]Parsing HTML with xpath or cssSelector?

How do I parse for just the text portions of these blocks of code? 如何仅解析这些代码块的文本部分? I am using Selenium client drivers in java. 我在Java中使用Selenium客户端驱动程序。

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyLBoldGrey StockStat">Out of stock</span> <span id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

or 要么

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyLLtgry StockStat">Not carried</span> <span class="BodyLLtgry" id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

or 要么

<li id="NOT_PUT_PREF_STORE" style="">
<span id="STORE_AVAIL" class="BodyMBold StockStatGreen">In stock</span> <span id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span>
</li>

I am trying to parse for the text portion in each of these variations in the webelement (ie: Not carried, In stock, Out of stock). 我正在尝试解析Web元素中每种变体中的文本部分(即:未携带,有货,无货)。 I am a very new user to selenium and html parsing so this is really hard for me to get functional. 我是selenium和html解析的新手,因此这对我来说真的很难起作用。

I was thinking that it would be something like 我以为那会是

WebElement driver = new FirefoxDriver(profile);
driver.get(Url);
System.out.println(driver.getElement(By.id("STORE_AVAIL").getText());

Not sure how I would do it with cssSelector but people tell me that is faster. 不确定我将如何使用cssSelector做到这一点,但人们告诉我这更快。 Would this work? 这行得通吗?

driver.getElement(By.xpath("//li[@id='NOT_PUT_PREF_STORE']./span[@id='STORE_AVAIL']").getText()

When you 'View Page Source' it will only show the original HTML source. 当您“查看页面源代码”时,它将仅显示原始HTML源代码。 It will not show changes made by AJAX calls, which looks like how the Walmart page is updating that section/element. 不会显示AJAX调用所做的更改,就像Walmart页面如何更新该部分/元素一样。 This question provides a better explanation. 这个问题提供了更好的解释。

Assuming you are using Firefox (based on the driver you are using), you can go to the page and click Ctrl+Shift+I to bring up the Inspector tool. 假设您使用的是Firefox(基于所使用的驱动程序),则可以转到该页面,然后单击Ctrl + Shift + I来启动Inspector工具。 Select the element you are interested in. Then click the [HTML] button (in the Inspector menu) to view the current source. 选择您感兴趣的元素。然后单击[HTML]按钮(在Inspector菜单中)以查看当前源。

Note that when you are getting the element using selenium webdriver, it will be getting the current value rather than the original value seen in the page source. 请注意,当您使用Selenium Webdriver获取元素时,它将获取当前值,而不是页面源中看到的原始值。 So you do not have to worry about what you see in the page source. 因此,您不必担心在页面源中看到的内容。

When I try to find elements on the page I always build my locators by: 当我尝试在页面上查找元素时,我总是通过以下方式构建定位器:

  1. id = driver.getElement(By.id("STORE_AVAIL").getText()); id = driver.getElement(By.id("STORE_AVAIL").getText());
  2. css selector = driver.getElement(By.css("span#STORE_AVAIL").getText()); css选择器= driver.getElement(By.css("span#STORE_AVAIL").getText());
  3. xpath = driver.getElement(By.xpath("//span[@id='STORE_AVAIL']").getText()); xpath = driver.getElement(By.xpath("//span[@id='STORE_AVAIL']").getText());

The id seems to be the fastest and easiest, both for webdriver and for me. 对于Webdriver和我来说,该ID似乎都是最快,最简单的。 id should be unique on the page. id在页面上应该是唯一的。

CSS take a little more investigative work on my part, but webdriver handles it just fine. CSS方面需要我做更多的调查工作,但是webdriver可以很好地处理它。

Lastly, xpath is sometimes unavoidable (unless you buy the devs a beer and ask nicely to change to application so you can locate it faster - after all, you are testing for them anyway). 最后,xpath有时是不可避免的(除非您购买了开发人员的啤酒,并且很好地要求更改应用程序,以便您可以更快地找到它-毕竟,无论如何都在测试它们)。 Locating by xpath with IE is terribly slow and writing complex xpaths is a drag. 使用IE通过xpath定位非常慢,编写复杂的xpath则很麻烦。

Xpath is also fragile, one small change to the dom can render your xpath unusable. Xpath也是脆弱的,对dom的一小处改动就会使您的xpath无法使用。 Then you get to debug/rewrite your xpath (it is as fun as it sounds). 然后您可以调试/重写xpath(听起来很有趣)。

My suggestion is to use Firebug and FirePath addons for Firefox to help you craft your locators. 我的建议是使用Firefox的Firebug和FirePath插件来帮助您制作定位器。

I am tried with the following html code snipet 我尝试使用以下html代码片段

 <li id="NOT_PUT_PREF_STORE" style=""> <span id="STORE_AVAIL" class="BodyLBoldGrey StockStat">Out of stock</span> <span id="InYourLocal">in your local</span> <span id="storeRollover_2"><span id="STORE_CITY" class="BodyLBoldLtgry VIBSStore1">West Hills</span></span> store<span id="notSelectOptionSOI">.</span> </li> 

I am using the following code to solve it. 我正在使用以下代码来解决它。 I get the tree of span elements using XPath and parse through each of it to get the text of the elements. 我使用XPath获得了span元素树,并对其进行解析以获取元素的文本。

driver.navigate().to("file:///C:/Users/abc/Desktop/test.html");
    List<WebElement> spanEle = driver.findElements(By.xpath("//li/span"));
    for (int i = 0; i < spanEle.size(); i++) {
             System.out.println(spanEle.get(i).getText());

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM