How to scrape specific data from scrape with simple html dom parser

Question

I am trying to scrape the datas from a webpage, but I get need to get all the data in this link .

include 'simple_html_dom.php';
$html1 = file_get_html('http://www.aktive-buergerschaft.de/buergerstiftungen/unsere_leistungen/buergerstiftungsfinder');

$info1 = $html1->find('b[class=[what to enter herer ]',0);

I need to get all the data out of this site .

Bürgerstiftung Lebensraum Aachen
    rechtsfähige Stiftung des bürgerlichen Rechts
    Ansprechpartner: Hubert Schramm
    Alexanderstr. 69/ 71
    52062 Aachen
    Telefon: 0241 - 4500130
    Telefax: 0241 - 4500131
    Email: info@buergerstiftung-aachen.de
    www.buergerstiftung-aachen.de
    >> Weitere Details zu dieser Stiftung

Bürgerstiftung Achim
    rechtsfähige Stiftung des bürgerlichen Rechts
    Ansprechpartner: Helga Kühn
    Rotkehlchenstr. 72
    28832 Achim
    Telefon: 04202-84981
    Telefax: 04202-955210
    Email: info@buergerstiftung-achim.de
    www.buergerstiftung-achim.de
    >> Weitere Details zu dieser Stiftung

I need to have the data that are "behind" the link - is there any way to do this with a easy and understandable parser - one that can be understood and written by a newbie??

Answer 1

Your provided links are down, I will suggest you to use the native PHP " DOM " Extension instead of "simple html parser", it will be much faster and easier;) I had a look at the page using googlecache, you can use something like:-

$doc = new DOMDocument;
@$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors
$contents = $doc->getElementById('content')->nodeValue; // Text contents of #content

Answer 2

Seems to be written in the documentation :

$html1->find('b[class=info]',0)->innertext;

Answer 3

From what i can quickly glance you need to loop through the <dl> tags in #content, then the dt and dd.

foreach ($html->find('#content dl') as $item) {
     $info = $item->find('dd');
     foreach ($info as $info_item) {..}
}

Using the simple_html_dom library

Answer 4

XPath makes scraping ridiculously easy, and allows for some changes in the HTML document to not affect you. For example, to pull out the names, you'd use a query that looks like:

//div[id='content']/d1/dt

A simple Google search will give you plenty of tutorials

Answer 5

@zero: there is good site to try out scrapping a site using both php and python...pretty helpful site atleast to me:- http://scraperwiki.com/

Answer 6

I'd use WWW:Mechanize

http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm

How to scrape specific data from scrape with simple html dom parser

Question

6 answers

solution1
7 2011-05-28 06:30:27

solution2
2 ACCPTED 2011-05-24 17:29:32

solution3
2 2011-05-26 20:32:25

solution4
1 2011-06-02 16:14:11

solution5
1 2011-06-02 17:49:16

solution6
-1 2011-06-02 01:26:34

How to scrape specific data from scrape with simple html dom parser

Question

6 answers

solution1 7 2011-05-28 06:30:27

solution2 2 ACCPTED 2011-05-24 17:29:32

solution3 2 2011-05-26 20:32:25

solution4 1 2011-06-02 16:14:11

solution5 1 2011-06-02 17:49:16

solution6 -1 2011-06-02 01:26:34

solution1
7 2011-05-28 06:30:27

solution2
2 ACCPTED 2011-05-24 17:29:32

solution3
2 2011-05-26 20:32:25

solution4
1 2011-06-02 16:14:11

solution5
1 2011-06-02 17:49:16

solution6
-1 2011-06-02 01:26:34