简体   繁体   中英

Getting data from an external webpage

What's the best way to get content from an external website via php?

Using php how do I go to webpage (ex: http://store.domain.com/1/ ) and scan the HTML coding for data that is found in between (which is the letter C and E). what php method do I use?

<span id="ctl00_ContentPlaceHolder1_phstats1_pname">C</span>
<span id="ctl00_ContentPlaceHolder1_phstats2_pname">E</span>

then save "C" (the found string) to $pname

$_session['pname1'] = $pname1;
$_session['pname2'] = $pname2;

You need to use web page scraping technique. It can be done simply by using HTML DOM Library or with technologies like Node.js and jQuery . You can find some useful tutorials regarding this here and here.

You may also see this thread regarding implementing scraping using PHP

The most efficient method is:

$content = file_get_contents('http://www.domain.com/whatever.html');

$pos = str_pos($content,'id="c');
$on=0;
while($pos!==false)
 {
 $content = substr($content,$pos+4);
 $pos = str_pos($content,'"');
 $list[$on] = substr($content,0,$pos);
 $on++;
 $pos = str_pos($content,'id="c');
 }

Then all yours values will be in the $list array, the count of which is $on.

You could also do it in one line with one of the preg functions, but I like the old-school method, it's a nanosecond faster.

i think you can actually use file_get_contents("http://store.domain.com/1/"); to do an http request.

as far as parsing it, depending on how big your project is and how much effort you're willing to go, you can find an html DOM parser like here http://simplehtmldom.sourceforge.net/ or simply search for id="ctl00_ContentPlaceHolder1_phstats1_pname" and take it apart piece by piece (not the recommended way of doing things).

It can be done by CURL. But you can just include the Simple HTML DOM Parser in your project. Its very easy to use and will serve your purpose.

The documentation is here. http://simplehtmldom.sourceforge.net/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM