如何从URL中提取内容？

Question

I am having a problem. 我遇到了问题。 This is what I have to do and the code is taking extremely long to run: 这是我必须要做的，代码运行时间极长：
There is 1 website I need to collect data from, and to do so I need my algorithm to visit over 15,000 subsections of this website (ie www.website.com/item.php?rid= $_id ), where $_id will be the current iteration of a for loop. 有一个网站我需要从中收集数据，为此我需要我的算法访问该网站的15,000个子部分（即www.website.com/item.php?rid= $_id ），其中$_id将是for循环的当前迭代。
Here are the problems: 以下是问题：

The method I am currently using to get the source code of each page is file_get_contents , and, as you can imagine, it takes super long to file_get_contents of 15,000+ pages. 我目前用于获取每个页面的源代码的方法是file_get_contents ，并且，如您所想，对于15,000多页的file_get_contents ，需要超长时间。
Each page contains over 900 lines of code, but all I need to extract is about 5 lines worth, so it seems as though the algorithm is wasting a lot of time by retrieving all 900 lines of it. 每个页面包含超过900行代码，但我需要提取的所有代码大约是5行，所以似乎算法通过检索所有900行来浪费大量时间。
Some of the pages do not exist (ie maybe www.website.com/item.php?rid= 2 exists but www.website.com/item.php?rid= 3 does not), so I need a method of quickly skipping over these pages before the algorithm tries to fetch its contents and waste a bunch of time. 有些页面不存在（即www.website.com/item.php?rid= 2存在，但www.website.com/item.php?rid= 3不存在），所以我需要一种快速跳过的方法在算法尝试获取其内容并浪费大量时间之前，在这些页面上。

In short, I need a method of extracting a small portion of the page from 15,000 webpages in as quick and efficient a manner as possible. 简而言之，我需要一种方法，以尽可能快速有效的方式从15,000个网页中提取页面的一小部分。
Here is my current code. 这是我目前的代码。

for ($_id = 0; $_id < 15392; $_id++){
    //****************************************************** Locating page
    $_location = "http://www.website.com/item.php?rid=".$_id;
    $_headers = @get_headers($_location);
    if(strpos($_headers[0],"200") === FALSE){
        continue;
    } // end if
    $_source = file_get_contents($_location);
    //****************************************************** Extracting price
    $_needle_initial = "<td align=\"center\" colspan=\"4\" style=\"font-weight: bold\">Current Price:";
    $_needle_terminal = "</td>";
    $_position_initial = (stripos($_source,$_needle_initial))+strlen($_needle_initial);
    $_position_terminal = stripos($_source,$_needle_terminal);
    $_length = $_position_terminal-$_position_initial;
    $_current_price = strip_tags(trim(substr($_source,$_position_initial,$_length)));
} // end for

Any help at all is greatly appreciated since I really need a solution to this! 任何帮助都非常感谢，因为我真的需要一个解决方案！
Thank you in advance for your help! 预先感谢您的帮助！

Answer 1

the short of it: don't. 缺点：不要。

longer: If you want to do this much work, you shouldn't do it on demand. 更长：如果你想做这么多工作，你不应该按需做。 Do it in the background! 在后台做吧！ You can use the code you have here, or any other method you're comfortable with, but instead of showing it to a user, you can save it in a database or a local file. 您可以使用此处的代码或您熟悉的任何其他方法，但不是将其显示给用户，而是可以将其保存在数据库或本地文件中。 Call this script with a cron job every x minutes (depends on the interval you need), and just show the latest content from your local cache (be it a database or a file). 每x分钟使用一个cron作业调用此脚本（取决于您需要的时间间隔），并显示本地缓存中的最新内容（无论是数据库还是文件）。

如何从URL中提取内容？

问题描述

1 个解决方案

解决方案1
2 2014-01-11 10:26:53

如何从URL中提取内容？

问题描述

1 个解决方案

解决方案1 2 2014-01-11 10:26:53

解决方案1
2 2014-01-11 10:26:53