简体   繁体   English

PHP中的regexp出现问题

[英]Trouble with regexp in PHP

Funny that my last question was on the same topic, but alas: 有趣的是,我的最后一个问题是关于同一主题的,但是a:

I'm running the following code: 我正在运行以下代码:

preg_match('/<th.*>.*Organizer.*title=\".*\">(.*)<\/a>/mi', $file_string, $organizer);

On the following content: 关于以下内容:

<tr>
<th valign="top"> Organizer:
</th>
<td style="width:55%;"> <a href="/starcraft2/TaKe" title="TaKe">TaKe</a>
</td></tr>

And I can't for the life of me figure out why it's not working. 而且我无法终生弄清楚为什么它不起作用。 I can get it to match Organizer: with the regexp '/.*Organizer', but it seems that as soon as there's a new line it stops to work, despite having the /m option. 我可以将其与Organizer:和regexp'/.*Organizer'匹配,但是,尽管有了/ m选项,但似乎一旦有新行它就会停止工作。 Any ideas? 有任何想法吗?

Okay so the issue is the new-line constant, however this Regex will get the text of the a element: 好的,所以问题是换行常量,但是此Regex将获取a元素的文本:

<th.*|\n>.*|\nOrganizer.*|\n*title=".*">(.*)<\/a>

Take note to the expression *|\\n . 注意表达式*|\\n

Here is a Regex 101 to prove it. 这是一个正则表达式101来证明这一点。


As Niet stated, you could just use the s modifier. 正如Niet所说,您可以只使用s修饰符。 The Regex would then be: 正则表达式将是:

<th.*>.*Organizer.*title=".*">(.*)<\/a>

but you would send in an additional modifier - s . 但您会另外发送一个修饰符s Here is a Regex 101 to prove it. 这是一个正则表达式101来证明这一点。

The dot metacharacter, by default, does not match newlines. 默认情况下,点元字符不与换行符匹配。 If you also want . 如果您还想要. to match newlines, you need the s modifier. 要匹配换行符,您需要s修饰符。

From the PHP manual : PHP手册

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines . 如果设置了此修饰符,则模式中的点元字符将匹配所有字符, 包括换行符 Without it, newlines are excluded. 没有它,换行符将被排除。

However, it's generally a bad idea to use regex to parse HTML. 但是,使用正则表达式解析HTML通常不是一个好主意。 I suggest you use a DOM Parser instead. 我建议您改用DOM分析器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM