简体   繁体   English

正则表达式PCRE表达式

[英]Regex PCRE expression

I have a piece of html code like the following one: 我有一段像以下一样的HTML代码:

<td width="24%"><b>Something</b></td>
          <td width="1%"></td>
          <td width="46%" align="center">
           <p><b>
    needed
  value</b></p>
          </td>
          <td width="28%" align="center">
            &nbsp;</td>
        </tr>

What is a good regex pattern to extract the first text node (not tags but the text inside) after the word Something I mean I want to extract 什么是一个很好的正则表达式来提取字后的第一个文本节点(不是标签,但里面的文字) Something我的意思是我想提取

     needed
  value

and nothing more. 仅此而已。

I cant figure out a working regex pattern in php. 我无法弄清楚php中正在运行的正则表达式模式。

EDIT: I am not parsing whole html document but few lines of it so all I want is to do it using Regex and no HTML parsers. 编辑:我没有解析整个HTML文档,但几行,所以我想要的是使用正则表达式,没有HTML解析器。

Ignoring potential issues parsing HTML with regex, the following pattern should match your example code: 忽略使用正则表达式解析HTML的潜在问题,以下模式应与您的示例代码匹配:

Something(?:(?:<[^>]+>)|\s)*([\w\s*]+)

This will match Something , followed by any list of HTML tags (or whitespace) and match the very next block of text, \\w (including whitespace). 这将匹配Something ,然后是HTML标签(或空白)的任何列表,并匹配下一个文本块\\w (包括空格)。

You can use this in PHP's preg_match() method like this: 您可以在PHP的preg_match()方法中使用它,如下所示:

if (preg_match('/Something(?:(?:<[^>]+>)|\s)*([\w\s*]+)/', $inputString, $match)) {
    $matchedValue = $match[1];
    // do whatever you need
}

Regex Explained: 正则表达式解释:

Something         # has to start with 'Something'
(?:               # non-matching group
    (?:           # non-matching group
        <[^>]+>   # any HTML tags, <...>
    )
    | \s          # OR whitespace
)*                # this group can match 0+ times
(
    [\w\s*]+      # any non-HTML words (with/without whitespace)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM