[英]PHP regex html data attributes with fixed markup
I have the following fixed pattern markup scenarios我有以下固定模式标记场景
<div class="myclass" id="id123" data-foo="bar">content</div>
<div class="myclass" id="id123" data-foo="bar" >content</div>
<div class="myclass" id="id123" data-foo="bar" data-baz="qux">content</div>
<div class="myclass" id="id123" data-foo="bar" data-baz="qux" >content</div>
I'm trying to parse the following values out我正在尝试解析以下值
id123
bar
qux (if it ever exists)
I was able to figure out how to get the different scenarios, but I'm haven't trouble coming up with one final rule that would work for all scenarios.我能够弄清楚如何获得不同的场景,但是我想出一条适用于所有场景的最终规则并不困难。
/<div class="myclass" id="(.*)" data-foo="(.*)"(data-baz="(.*)")?>/
I seem to be missing some basic regex principle.我似乎缺少一些基本的正则表达式原则。 I tried bounding and ending and whitespace but not luck.
我尝试了边界和结尾以及空格,但没有运气。
$text = <<<TEXT
<div class="myclass" id="id123" data-foo="bar">content</div>
<div class="myclass" id="id123" data-foo="bar" >content</div>
<div class="myclass" id="id123" data-foo="bar" data-baz="qux">content</div>
<div class="myclass" id="id123" data-foo="bar" data-baz="qux" >content</div>
TEXT;
preg_match_all('~<div class="myclass" id="(.*?)" data-foo="(.*?)" ?(?:data-baz="(.*?)" ?)?>~', $text, $matches);
var_export(array_slice($matches, 1));
Output: Output:
0 =>
array (
0 => 'id123',
1 => 'id123',
2 => 'id123',
3 => 'id123',
),
1 =>
array (
0 => 'bar',
1 => 'bar',
2 => 'bar',
3 => 'bar',
),
2 =>
array (
0 => '',
1 => '',
2 => 'qux',
3 => 'qux',
),
)
You can improve the regex efficiency by not using lazy quantifiers.您可以通过不使用惰性量词来提高正则表达式的效率。 If you know that the attribute values will not contain double-quotes, then you can use a this negated character class with a greedy quantifier:
[^"]*
.如果您知道属性值将不包含双引号,那么您可以使用带有贪心量词的这个否定字符 class :
[^"]*
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.