[英]Trouble with regexp in PHP
Funny that my last question was on the same topic, but alas: 有趣的是,我的最后一个问题是关于同一主题的,但是a:
I'm running the following code: 我正在运行以下代码:
preg_match('/<th.*>.*Organizer.*title=\".*\">(.*)<\/a>/mi', $file_string, $organizer);
On the following content: 关于以下内容:
<tr>
<th valign="top"> Organizer:
</th>
<td style="width:55%;"> <a href="/starcraft2/TaKe" title="TaKe">TaKe</a>
</td></tr>
And I can't for the life of me figure out why it's not working. 而且我无法终生弄清楚为什么它不起作用。 I can get it to match Organizer: with the regexp '/.*Organizer', but it seems that as soon as there's a new line it stops to work, despite having the /m option.
我可以将其与Organizer:和regexp'/.*Organizer'匹配,但是,尽管有了/ m选项,但似乎一旦有新行它就会停止工作。 Any ideas?
有任何想法吗?
Okay so the issue is the new-line constant, however this Regex will get the text of the a
element: 好的,所以问题是换行常量,但是此Regex将获取
a
元素的文本:
<th.*|\n>.*|\nOrganizer.*|\n*title=".*">(.*)<\/a>
Take note to the expression *|\\n
. 注意表达式
*|\\n
。
Here is a Regex 101 to prove it. 这是一个正则表达式101来证明这一点。
As Niet
stated, you could just use the s
modifier. 正如
Niet
所说,您可以只使用s
修饰符。 The Regex would then be: 正则表达式将是:
<th.*>.*Organizer.*title=".*">(.*)<\/a>
but you would send in an additional modifier - s
. 但您会另外发送一个修饰符
s
。 Here is a Regex 101 to prove it. 这是一个正则表达式101来证明这一点。
The dot metacharacter, by default, does not match newlines. 默认情况下,点元字符不与换行符匹配。 If you also want
.
如果您还想要
.
to match newlines, you need the s
modifier. 要匹配换行符,您需要
s
修饰符。
From the PHP manual : 从PHP手册 :
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines .
如果设置了此修饰符,则模式中的点元字符将匹配所有字符, 包括换行符 。 Without it, newlines are excluded.
没有它,换行符将被排除。
However, it's generally a bad idea to use regex to parse HTML. 但是,使用正则表达式解析HTML通常不是一个好主意。 I suggest you use a DOM Parser instead.
我建议您改用DOM分析器 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.