简体   繁体   English

硒-查找仅用双引号引起来的文本

[英]Selenium - Find text only enclosed by double quotes

I'm trying to retrieve text from a set of webpages, but some of the text I'd like to retrieve is not enclosed in any tag. 我正在尝试从一组网页中检索文本,但是我想检索的某些文本未包含在任何标记中。 I can easily retrieve the rest of the contents, but on every page there is a paragraph of text only enclosed in double quotes and nothing else. 我可以轻松地检索其余的内容,但是在每个页面上都有一段仅用双引号引起来的文本段落,而没有其他内容。 Currently I'm able to locate the element which it lies under, but there's so much other content in that element so is it possible to specify an xpath which goes into this element and exclusively retrieves text enclosed in double quotes? 目前,我能够找到它所在的元素,但是该元素中还有很多其他内容,因此是否可以指定一个xpath插入该元素并专门检索用双引号引起来的文本?

Edit: Below is what I'd like to retrieve, the two lines of text below the h1-tag. 编辑:以下是我要检索的内容,即h1标记下方的两行文本。 There is more in the element, but not of any relevance. 元素中还有更多,但没有任何关联。 So the xpath I'm looking for is something along the lines of "find any unenclosed text within the article-element with class "widget-content". 因此,我正在寻找的xpath类似于“在类元素“ widget-content”中找到文章元素内所有未封闭的文本)。

 <article class="widget-content"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <script src="/Modules/Orchard.jQuery/scripts/jquery-1.9.1.js" type="text/javascript"></script> <h1>Placeholder title</h1> Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text <br /> <br /> Placeholder: Another placeholder <br /> <br /> 

应该是这样的:

xpath=//article[contains(@class, 'widget-content')]/article[1]

Your xpath should be something like this: 您的xpath应该是这样的:

//article/text()

It will output only the text that is outside any tag . 它将仅输出任何tag之外的文本。

Hope it helps! 希望能帮助到你!

Q: So the xpath I'm looking for is something along the lines of "find any unenclosed text within the article-element with class "widget-content". 问: 因此,我要查找的xpath类似于“在类元素为“ widget-content”的文章元素中查找任何未封闭的文本)。
This would be: 这将是:

//article[@class='widget-content']/text()

But this will contain a loot of empty text nodes (whitespace only) to avoid them try: 但这将包含大量的空文本节点(仅限空白),以避免尝试:

//article[@class='widget-content']/text()[normalize-space() !='']  

Q: Below is what I'd like to retrieve, the two lines of text below the h1-tag. 问:以下是我想检索的内容,即h1标记下方的两行文本。

This would be ( /h1/following-sibling::text() ), or all together: 这将是( /h1/following-sibling::text() ),或全部在一起:

"//article[@class='widget-content']/h1/following-sibling::text()[normalize-space() !='']"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM