[英]Regex PCRE expression
I have a piece of html code like the following one: 我有一段像以下一样的HTML代码:
<td width="24%"><b>Something</b></td>
<td width="1%"></td>
<td width="46%" align="center">
<p><b>
needed
value</b></p>
</td>
<td width="28%" align="center">
</td>
</tr>
What is a good regex pattern to extract the first text node (not tags but the text inside) after the word Something
I mean I want to extract 什么是一个很好的正则表达式来提取字后的第一个文本节点(不是标签,但里面的文字)
Something
我的意思是我想提取
needed
value
and nothing more. 仅此而已。
I cant figure out a working regex pattern in php. 我无法弄清楚php中正在运行的正则表达式模式。
EDIT: I am not parsing whole html document but few lines of it so all I want is to do it using Regex and no HTML parsers. 编辑:我没有解析整个HTML文档,但几行,所以我想要的是使用正则表达式,没有HTML解析器。
Ignoring potential issues parsing HTML with regex, the following pattern should match your example code: 忽略使用正则表达式解析HTML的潜在问题,以下模式应与您的示例代码匹配:
Something(?:(?:<[^>]+>)|\s)*([\w\s*]+)
This will match Something
, followed by any list of HTML tags (or whitespace) and match the very next block of text, \\w
(including whitespace). 这将匹配
Something
,然后是HTML标签(或空白)的任何列表,并匹配下一个文本块\\w
(包括空格)。
You can use this in PHP's preg_match()
method like this: 您可以在PHP的
preg_match()
方法中使用它,如下所示:
if (preg_match('/Something(?:(?:<[^>]+>)|\s)*([\w\s*]+)/', $inputString, $match)) {
$matchedValue = $match[1];
// do whatever you need
}
Regex Explained: 正则表达式解释:
Something # has to start with 'Something'
(?: # non-matching group
(?: # non-matching group
<[^>]+> # any HTML tags, <...>
)
| \s # OR whitespace
)* # this group can match 0+ times
(
[\w\s*]+ # any non-HTML words (with/without whitespace)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.