简体   繁体   中英

Scraping with simple_html_dom

I am trying to scrape this:

<a id="pa1">Site1</a>
<font size="-1">Text1</font><br />
<font size="-1" color="green">Text2</font><br />

I get get to pa1 easily..but I want to get to the two fonts that come after.. So I used this:

$html = new simple_html_dom();
$html->load($document);

foreach ($html->find('#pa1>font') as $e) {
    $this->check_line_two = $this->process_array_elements($e->innertext);
}

foreach ($html->find('#pa1>font>font') as $e) {
    $this->check_line_three = $this->process_array_elements($e->innertext);
}

Both didn't work. How can I get the next element with simple html dom?

There is no descendant font -tag within #pa1 .

What you are obviously searching for is the sibling selector + : #pa1 + font . But I don't know if it is supported by the library you are using.

Please read their documentation: http://simplehtmldom.sourceforge.net/manual.htm

Like feeela said, those font elements are not descendants of the anchor. Try something like this:

foreach ($html->find('#pa1') as $e) {
    $firstFontElement = $e->next_sibling();
}

If that is all you are trying to scrap, why don't you just select the font tag.

foreach ($html->find('font') as $e) {
    $this->check_line_two = $this->process_array_elements($e->innertext);
}

Or is there a possibility that more font tags are present in the document?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM