I'm using PHP Simple HTML DOM Parser to extract a list of URLs from a page as follows:
<?php
include('simple_html_dom.php');
$url = 'http://www.domain.com/';
$html = file_get_html($url);
foreach($html->find('table[width=370]') as $table)
{
foreach($table->find('a') as $item)
echo $item->outertext . '<br><hr>';
}
$html->clear();
?>
It works just fine insofar as it extracts the required information, however, some of the a tags (on domain.com) are formatted like this:
<a href="http://www.domain.com"><font size="2">Anchor text</font></a>
Whereas, in others, the font size is defined in the p tag that contains each a tag, meaning the a tag is displayed as:
<a href="http://www.domain.com">Anchor text</a>
Is there any way to strip out the font tag from those a tags that have it? It's probably very simple, but I've been 'running around in rings' for ages trying to do it :(
Thanks for any ideas or suggestions you might have.
Tom.
strip_tags() maybe?
If you only want to allow the a
tag, just use:
echo strip_tags($item->outertext, 'a');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.