简体   繁体   English

从外部网页获取数据

[英]Getting data from an external webpage

What's the best way to get content from an external website via php? 通过php从外部网站获取内容的最佳方法是什么?

Using php how do I go to webpage (ex: http://store.domain.com/1/ ) and scan the HTML coding for data that is found in between (which is the letter C and E). 使用php如何访问网页(例如: http : //store.domain.com/1/ )并扫描HTML编码以查找介于两者之间的数据(即字母C和E)。 what php method do I use? 我使用什么php方法?

<span id="ctl00_ContentPlaceHolder1_phstats1_pname">C</span>
<span id="ctl00_ContentPlaceHolder1_phstats2_pname">E</span>

then save "C" (the found string) to $pname 然后将“ C”(找到的字符串)保存到$ pname

$_session['pname1'] = $pname1;
$_session['pname2'] = $pname2;

You need to use web page scraping technique. 您需要使用网页抓取技术。 It can be done simply by using HTML DOM Library or with technologies like Node.js and jQuery . 只需使用HTML DOM Library或使用Node.jsjQuery类的技术即可完成。 You can find some useful tutorials regarding this here and here. 您可以在这里这里找到一些有用的教程

You may also see this thread regarding implementing scraping using PHP 您可能还会看到有关使用PHP实施抓取的线程

The most efficient method is: 最有效的方法是:

$content = file_get_contents('http://www.domain.com/whatever.html');

$pos = str_pos($content,'id="c');
$on=0;
while($pos!==false)
 {
 $content = substr($content,$pos+4);
 $pos = str_pos($content,'"');
 $list[$on] = substr($content,0,$pos);
 $on++;
 $pos = str_pos($content,'id="c');
 }

Then all yours values will be in the $list array, the count of which is $on. 然后,您所有的值都将在$ list数组中,其计数为$ on。

You could also do it in one line with one of the preg functions, but I like the old-school method, it's a nanosecond faster. 您也可以使用preg函数之一来完成它,但是我喜欢老式的方法,它的速度要快十亿分之一秒。

i think you can actually use file_get_contents("http://store.domain.com/1/"); 我认为您实际上可以使用file_get_contents("http://store.domain.com/1/"); to do an http request. 进行http请求。

as far as parsing it, depending on how big your project is and how much effort you're willing to go, you can find an html DOM parser like here http://simplehtmldom.sourceforge.net/ or simply search for id="ctl00_ContentPlaceHolder1_phstats1_pname" and take it apart piece by piece (not the recommended way of doing things). 就解析而言,根据您的项目规模和您愿意付出的努力,您可以在此处找到一个html DOM解析器,例如http://simplehtmldom.sourceforge.net/或仅搜索id="ctl00_ContentPlaceHolder1_phstats1_pname" ,并将其id="ctl00_ContentPlaceHolder1_phstats1_pname" (不是推荐的处理方式)。

It can be done by CURL. 可以通过CURL完成。 But you can just include the Simple HTML DOM Parser in your project. 但是,您可以仅在项目中包括简单HTML DOM解析器。 Its very easy to use and will serve your purpose. 它非常易于使用,将满足您的目的。

The documentation is here. 文档在这里。 http://simplehtmldom.sourceforge.net/ http://simplehtmldom.sourceforge.net/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM