简体   繁体   English

使用XPATH从html标记内的文本中提取日期

[英]Extract date from text inside html tags using XPATH

Extract date inside html tag using xpath substring 使用xpath子字符串在html标记中提取日期

I have tried using substring in xpath 我尝试在xpath中使用子字符串

<span id="latestReplyLine"><a href="#comment-965609" class="lastScroll js-latest-reply">Latest reply</a> on May 22, 2019 by John Stoltzfus</span>

I am using below xpath query to extract text 我正在下面的xpath查询中提取文本

/span[@id="latestReplyLine"]/text()[substring-after(substring-before(.,' by '), ' on ')]

Expected result - 预期结果 -

"May 22, 2019"

But I am getting, 但我明白了

"on May 22, 2019 by John Stoltzfus"

any idea? 任何想法?

You were missing the right string by one space ( on instead of on ). 您缺少正确的字符串一个空格( on而不是on )。
An improved XPath expression is the following: 改进的XPath表达式如下:

normalize-space(substring-after(substring-before(string(/span[@id='latestReplyLine']),'by'), 'on'))

This will give you the right result. 这将给您正确的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM