简体   繁体   English

使用Javascript正则表达式从HTML中提取文本

[英]Extract text from HTML with Javascript regex

I am trying to parse a webpage and to get the number reference after <li>YM#. 我正在尝试解析网页并在<li> YM#之后获取数字参考。 For example I need to get 1234-234234 in a variable from the HTML that contains 例如,我需要从包含的HTML中的变量中获取1234-234234

<li>YM# 1234-234234 </li> <li> YM#1234-234234 </ li>

Many thanks for your help someone! 非常感谢你的帮助!

Rich 丰富

Try this: 尝试这个:
(<li>[^#<>]*?# *)([\\d\\-]+)\\b
and get the result in $2 . 得到$2的结果。

(?!<li>YM#\\s)([\\d-]+)

http://regexr.com?30ng5 http://regexr.com?30ng5

This will match the numbers. 这将匹配数字。

currently, your regex only matches if there is a single number before the dash and a single number after it. 目前,只有在短划线前有一个数字且后面有一个数字时,你的正则表达式才匹配。 This will let you get one or more numbers in each place instead: 这样您就可以在每个地方获得一个或多个数字:

/YM#[0-9]+-[0-9]+/g

Then, you also need to capture it, so we use a cgroup to captue it: 然后,您还需要捕获它,因此我们使用cgroup来捕获它:

/YM#([0-9]+-[0-9]+)/g

Then we need to refer to the capture group again, so we use the following code instead of the String.match 然后我们需要再次引用捕获组,因此我们使用以下代码而不是String.match

var regex = /YM#([0-9]+-[0-9]+)/g;
var match = regex.exec(text);
var id = match[1];
 // 0: match of entire regex
 // after that, each of the groups gets a number

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM