简体   繁体   中英

Regex to match 2 html tags in 1 HTML file

I have a HTML file which contains the following:

<img src="MATCH1" bla="blabla">
<something:else bla="blabla" bla="bla"><something:else2 something="something">
<something image="MATCH2" bla="abc">

Now I need a regex to match both MATCH1 and MATCH2

Also the HTML contains multiple parts like this, so it can be in the HTML 1, 2, 3 of x times..

When I say:

<img\s*src="(.*?)".*?<something\s*image="(.*?)"

It doesn't match it. What am I missing here?

Thanks in advance!

Regex does not always provide perfect result while parsing HTML.

I think you should do it using HTML DOM Parser

For Example:

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// OR Create a DOM object from a HTML file
$html = file_get_html('test.htm');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

There are filters to get tags with specific attributes:

[attribute] Matches elements that have the specified attribute.

[attribute=value] Matches elements that have the specified attribute with a certain value.

[attribute!=value] Matches elements that don't have the specified attribute with a certain value.

[attribute^=value] Matches elements that have the specified attribute and it starts with a certain value.

[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.

[attribute*=value] Matches elements that have the specified attribute and it contains a certain value.

More Options


There are also some other Parsing Tools to parse HTML as described in this answer .

Hmmm, I'll better elaborate before more anti-regex memers come around. In your case it is actually applicable to use regular expressions. However I'd like to point out, that you should carefully evaluate on the pros and cons .

It's mostly simpler to use phpQuery or QueryPath for such tasks:

qp($html)->find("img")->attr("src");

But a regex is possible too, if you don't overlook the gritty details:

preg_match('#<img[^>]+src="([^">]*)".+?<something\s[^>]*image="([^">]*)"#ims', $html, $m);

If extraction depends on the presence of both tags, then it might be a better option here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM