简体   繁体   English

正则表达式将html标记与特定内容相匹配

[英]regular expression to match html tag with specific contents

I am trying to write a regular expression to capture this string: 我正在尝试编写一个正则表达式来捕获此字符串:

<td style="white-space:nowrap;">###.##</td>

I can't even match it if include the string as it is in the regex pattern! 如果包含正则表达式中的字符串,我甚至无法匹配它! I am using preg_match_all() , however, I am not finding the correct pattern. 我正在使用preg_match_all() ,但是,我找不到正确的模式。 I am thinking that "white-space:nowrap;" 我在想"white-space:nowrap;" is throwing off the matching in some way. 以某种方式抛弃匹配。 Any idea? 任何想法? Thanks ... 谢谢 ...

Why not try using DOM document instead? 为什么不尝试使用DOM文档呢? Then you do not have to worry about having the HTML formatted properly. 然后,您不必担心HTML格式正确。 Using the Dom Doc collection will also improve readability and ensure fast performance since its part of the PHP Core rather then living in user space 使用Dom Doc集合还可以提高可读性并确保快速性能,因为它是PHP Core的一部分,而不是生活在用户空间中

When I'm having problems with regular expressions, I like to test them in real time with one of the following websites: 当我遇到正则表达式问题时,我喜欢使用以下网站之一实时测试它们:

Did you see any warnings? 你看到了什么警告吗? You have to escape some bits of that, namely the / before the td close tag. 你必须逃避一些,即td close标签之前的/ This seemed to work for me: 这似乎对我有用:

$string='cow cow cow    <td style="white-space:nowrap;">###.##</td> cat cat cat cat';
php > preg_match_all('/<td style="white-space:nowrap;">###\.##<\/td>/',$string,$result);
php > var_dump($result);
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(43) "<td style="white-space:nowrap;">###.##</td>"
  }
}

Are you aware that the regex argument to any of PHP's preg_ functions has to be double-delimited? 您是否知道任何PHP的preg_函数的正则表达式参数必须是双重分隔的? For example: 例如:

preg_match_all(`'/foo/'`, $target, $results)

'...' are the string delimiters, /.../ are the regex delimiters, and the actual regex is foo . '...'字符串分隔符, /.../正则表达式的分隔符,而实际正则表达式是foo The regex delimiters don't have to be slashes, they just have to match; 正则表达式分隔符不必是斜杠,它们只需要匹配; some popular choices are #...# , %...% and ~...~ . 一些流行的选择是#...#%...%~...~ They can also be balanced pairs of bracketing characters, like {...} , (...) , [...] , and <...> ; 它们也可以是平衡的包围字符对,如{...}(...)[...]<...> ; those are much less popular, and for good reason. 那些不太受欢迎,并且有充分的理由。

If you leave out the regex delimiters, the regex-compilation phase will probably fail and the error message will probably make no sense. 如果省略正则表达式分隔符,正则表达式编译阶段可能会失败,错误消息可能没有任何意义。 For example, this code: 例如,这段代码:

preg_match_all('<td style="white-space:nowrap;">###.##</td>', $s, $m)

...would generate this message: ...会生成此消息:

 Unknown modifier '#'

It tries to use the first pair of angle brackets as the regex delimiters, and whatever follows the > as the regex modifiers (eg, i for case-insensitive, m for multiline). 它尝试使用第一对尖括号作为正则表达式分隔符,以及>作为正则表达式修饰符后的任何内容(例如, i表示不区分大小写, m表示多线)。 To fix that, you would add real regex delimiters, like so: 要解决这个问题,你可以添加真正的正则表达式分隔符,如下所示:

preg_match_all('%<td style="white-space:nowrap;">###\.##</td>%i', $s, $m)

The choice of delimiter is a matter of personal preference and convenience. 分隔符的选择取决于个人偏好和便利性。 If I had used # or / , I would have had to escape those characters in the actual regex. 如果我使用了#/ ,我将不得不在实际的正则表达式中逃避这些字符。 I escaped the . 我逃脱了. because it's a regex metacharacter. 因为它是一个正则表达式元字符。 Finally, I added the i modifier to demonstrate the use of modifiers and because HTML isn't case sensitive. 最后,我添加了i修饰符来演示修饰符的使用,因为HTML 区分大小写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM