简体   繁体   中英

Selenium - Find text only enclosed by double quotes

I'm trying to retrieve text from a set of webpages, but some of the text I'd like to retrieve is not enclosed in any tag. I can easily retrieve the rest of the contents, but on every page there is a paragraph of text only enclosed in double quotes and nothing else. Currently I'm able to locate the element which it lies under, but there's so much other content in that element so is it possible to specify an xpath which goes into this element and exclusively retrieves text enclosed in double quotes?

Edit: Below is what I'd like to retrieve, the two lines of text below the h1-tag. There is more in the element, but not of any relevance. So the xpath I'm looking for is something along the lines of "find any unenclosed text within the article-element with class "widget-content".

 <article class="widget-content"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <script src="/Modules/Orchard.jQuery/scripts/jquery-1.9.1.js" type="text/javascript"></script> <h1>Placeholder title</h1> Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text <br /> <br /> Placeholder: Another placeholder <br /> <br /> 

应该是这样的:

xpath=//article[contains(@class, 'widget-content')]/article[1]

Your xpath should be something like this:

//article/text()

It will output only the text that is outside any tag .

Hope it helps!

Q: So the xpath I'm looking for is something along the lines of "find any unenclosed text within the article-element with class "widget-content".
This would be:

//article[@class='widget-content']/text()

But this will contain a loot of empty text nodes (whitespace only) to avoid them try:

//article[@class='widget-content']/text()[normalize-space() !='']  

Q: Below is what I'd like to retrieve, the two lines of text below the h1-tag.

This would be ( /h1/following-sibling::text() ), or all together:

"//article[@class='widget-content']/h1/following-sibling::text()[normalize-space() !='']"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM