简体   繁体   English

PHP REgex使用preg_match全部

[英]PHP REgex using preg_match all

<OPTION value=a.a.>Afaceri</OPTION>
<OPTION value=a.b.>Mass Media</OPTION>
<OPTION value=a.c.>Publicitate</OPTION>
<OPTION value=b.a.>Agricultura</OPTION>

i want to extract "Afaceri,Mass Media,Publicitate,Agricultura" from this html code with an php regex how can i do? 我想使用php regex从此html代码中提取“ Afaceri,大众媒体,公共,农业”。我该怎么办?

html and regexes can be a bit slippery; html和regexes可能有点滑。 an alternative solution, assuming that your fragment of html is formatted as above with a newline after each option could be to use strip_tags() 一个替代解决方案,假设您的html片段在每个选项之后都可以使用换行符按上述格式设置,则可以使用strip_tags()

<?php
// your html fragment
$html = "<OPTION value=a.a.>Afaceri</OPTION>
<OPTION value=a.b.>Mass Media</OPTION>
<OPTION value=a.c.>Publicitate</OPTION>
<OPTION value=b.a.>Agricultura</OPTION>";

// explode by newline
$opts = explode(PHP_EOL, $html);

// use strip_tags on each element
$names = array_map(function($opt) {
    return strip_tags($opt);
}, $opts);

// done
var_dump($names);

Should yield: 应该产生:

array (size=4)
  0 => string 'Afaceri' (length=7)
  1 => string 'Mass Media' (length=10)
  2 => string 'Publicitate' (length=11)
  3 => string 'Agricultura' (length=11)

Hope this helps. 希望这可以帮助。

这是一个正则表达式,标签之间的字符串没有任何条件。

$names = preg_match_all('/<OPTION.*?>(.*?)<\/OPTION>/i', $string);

Well, we often (almost always) suggest to use DOM parser and give link to the manual but I've not seen much examples. 好吧,我们经常(几乎总是)建议使用DOM解析器并提供手册的链接,但我没有看到很多示例。

While regex is capable to parse html it is not the right tool. 尽管regex能够解析html,但它不是正确的工具。 You need to use some dom parser to avoid head-ache with malformed html. 您需要使用一些dom解析器来避免HTML格式错误。 PHP gives nice API to work with. PHP提供了不错的API

For example, you should do something like this with DOMDocument to get desired output: 例如,您应该对DOMDocument做类似的事情以获得所需的输出:

<?php
$html = <<<HTML
<OPTION value=a.a.>Afaceri</OPTION>
<OPTION value=a.b.>Mass Media</OPTION>
<OPTION value=a.c.>Publicitate</OPTION>
<OPTION value=b.a.>Agricultura</OPTION>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('option');

$result = array();
foreach ($nodes as $node) {
    $result[] = $node->nodeValue;
}

var_dump($result);

Demo 演示

You can try with something like this: /<OPTION [^>]+>([^<]+)<\\/OPTION>/ 您可以尝试这样的操作: /<OPTION [^>]+>([^<]+)<\\/OPTION>/

That means: Match the text "<OPTION " followed by one or more characters that are not ">", then match that ">" and capture one or more characters that are not "<", followed by "</OPTION>". 这意味着:匹配文本“ <OPTION”,后跟一个或多个非“>”字符,然后匹配该“>”,并捕获一个或多个非“ <”的字符,后跟“ </ OPTION>” 。

Btw, if you want to avoid escaping, to make the regex more clean, you could use a different delimiter, like this: #<OPTION [^>]+>([^<]+)</OPTION># 顺便说一句,如果您想避免转义,以使正则表达式更整洁,可以使用其他定界符,例如: #<OPTION [^>]+>([^<]+)</OPTION>#

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM