Scraping with simple_html_dom

Question

I am trying to scrape this:

<a id="pa1">Site1</a>
<font size="-1">Text1</font><br />
<font size="-1" color="green">Text2</font><br />

I get get to pa1 easily..but I want to get to the two fonts that come after.. So I used this:

$html = new simple_html_dom();
$html->load($document);

foreach ($html->find('#pa1>font') as $e) {
    $this->check_line_two = $this->process_array_elements($e->innertext);
}

foreach ($html->find('#pa1>font>font') as $e) {
    $this->check_line_three = $this->process_array_elements($e->innertext);
}

Both didn't work. How can I get the next element with simple html dom?

Answer 1

There is no descendant font -tag within #pa1 .

What you are obviously searching for is the sibling selector + : #pa1 + font . But I don't know if it is supported by the library you are using.

Please read their documentation: http://simplehtmldom.sourceforge.net/manual.htm

Answer 2

Like feeela said, those font elements are not descendants of the anchor. Try something like this:

foreach ($html->find('#pa1') as $e) {
    $firstFontElement = $e->next_sibling();
}

Answer 3

If that is all you are trying to scrap, why don't you just select the font tag.

foreach ($html->find('font') as $e) {
    $this->check_line_two = $this->process_array_elements($e->innertext);
}

Or is there a possibility that more font tags are present in the document?

Scraping with simple_html_dom

Question

3 answers

solution1
2 2012-08-29 10:10:32

solution2
2 ACCPTED 2012-08-29 10:16:07

solution3
0 2012-08-29 10:30:18

Scraping with simple_html_dom

Question

3 answers

solution1 2 2012-08-29 10:10:32

solution2 2 ACCPTED 2012-08-29 10:16:07

solution3 0 2012-08-29 10:30:18

solution1
2 2012-08-29 10:10:32

solution2
2 ACCPTED 2012-08-29 10:16:07

solution3
0 2012-08-29 10:30:18