简体   繁体   中英

PHP Regex match all HTML tags

I am reading contains of an HTML page for some details, I'm searching for every occurrence of a string, that string comes withing a tag, I want to read just that string only.

Example:

<a href="http://www.example.com/search?la=en&q=javascript">javascript</a>
<a href="http://www.example.com/search?la=en&q=PHP">PHP</a>

I just want to read every occurrence of tags TEXT on the basis of href tag which must contain this ( http://www.example.com/search?la=en&q= ).

Any idea?

SimpleHtmlDom example (isn't it pretty?):

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all links 
foreach($html->find('a') as $element) {
       echo $element->href . '<br>';
       echo $element->text; //this is what you want
}

If the HTML page you're reading is very regular (for instance, machine-generated according to predictable patterns), something like this would work:

preg_match('|<a\s+href="http://www.example.com/search\?la=en&q=(\w+)"\s*>\1</a>|', $page)

But if it gets any more complicated than that, regular expressions probably won't be enough for the job - you'd be better off using a full HTML parser to extract the links and check them one-by-one to find the text you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM