如何从Perl发出HTTP GET请求？

Question

I'm trying to write my first Perl program. 我正在尝试编写我的第一个Perl程序。 If you think that Perl is a bad language for the task at hand tell me what language would solve it better. 如果您认为Perl对于手头的任务来说是一种糟糕的语言，请告诉我哪种语言可以更好地解决它。

The program tests connectivity between given machine and remote Apache server. 该程序测试给定机器和远程Apache服务器之间的连接。 At first program requests the directory listing from the Apache server, than it parses the list and downloads all files one by one. 首先，程序从Apache服务器请求目录列表，而不是解析列表并逐个下载所有文件。 Should there be a problem with file (connection resets before reaching the specified Content-Length) this should be logged and next file should be retrieved. 如果文件出现问题（连接在达到指定的Content-Length之前重置），则应记录此信息并检索下一个文件。 There is no need to save the files or even check the integrity, I only need to log the time it takes to complete and all cases where connection resets. 无需保存文件甚至检查完整性，我只需要记录完成所需的时间以及连接重置的所有情况。

To retrieve the list of links from Apache-generated directory index I plan to use regexp similar to 要从Apache生成的目录索引中检索链接列表，我计划使用类似的regexp

/href=\"([^\"]+)\"/

The regexp is not debugged yet, indeed. 实际上，regexp尚未调试。

What is the "reference" way to do HTTP request from Perl? 从Perl执行HTTP请求的“参考”方式是什么？ I googled and found examples using many different libraries, some of them commercial. 我用Google搜索并找到了使用许多不同库的示例，其中一些是商业化的。 I need something that can detect disconnections (timeout or TCP reset) and handle these. 我需要能够检测到断开连接（超时或TCP重置）并处理这些内容的东西。

Another question. 另一个问题。 How do I store everything caught by my regexp when searching globally as a list of string with the minimal coding effort? 当使用最少的编码工作全局搜索字符串列表时，如何存储我的正则表达式捕获的所有内容？

Answer 1

As far as the whole problem description goes, I would use WWW::Mechanize . 就整个问题描述而言，我会使用WWW :: Mechanize 。 Mechanize is a subclass of LWP::UserAgent that adds stateful behavior and HTML parsing. Mechanize是LWP::UserAgent的子类，它添加了有状态行为和HTML解析。 With mech, you can just do $mech->get($url_of_index_page) , and then use $mech->find_all_links(criteria) to select the links to follow. 使用mech，你可以只需要$mech->get($url_of_index_page) ，然后使用$mech->find_all_links(criteria)来选择要遵循的链接。

Answer 2

You have many questions in one. 你有很多问题。 The answer to the question in the title of your post is to use LWP::Simple . 你帖子标题中问题的答案是使用LWP :: Simple 。

Most of your other questions are answered in perlfaq9 with appropriate pointers to further information. 您可以在perlfaq9中回答大多数其他问题，并提供适当的指示信息。

Answer 3

As for the parsing markup with regular expressions part of your question, DON'T! 至于正则表达式的解析标记是你问题的一部分，不要！

http://htmlparsing.icenine.ca explains some of the reasons why you shouldn't do this. http://htmlparsing.icenine.ca解释了为什么不应该这样做的一些原因。 Although what you're seemingly attempting to parse seems simple, use a proper parser. 虽然你似乎试图解析的东西看起来很简单，但使用适当的解析器。

Page linked above no longer exists... 上面链接的页面不再存在...

http://www.cwhitener.com/htmlparsing http://www.cwhitener.com/htmlparsing

Answer 4

As more general answer, Perl is a perfectly fine language for doing HTTP requests, as are a host of other languages. 作为更一般的答案，Perl是用于执行HTTP请求的完美语言，就像许多其他语言一样。 If you're familiar with Perl, don't even hesitate; 如果您熟悉Perl，请不要犹豫; there are many excellent libraries available to do what you need. 有许多优秀的图书馆可以满足您的需求。

如何从Perl发出HTTP GET请求？

问题描述

4 个解决方案

解决方案1
10 已采纳 2009-10-12 21:20:26

解决方案2
9 2009-10-12 20:33:52

解决方案3
4 2009-10-12 20:36:41

解决方案4
3 2009-10-12 20:36:10

如何从Perl发出HTTP GET请求？

问题描述

4 个解决方案

解决方案1 10 已采纳 2009-10-12 21:20:26

解决方案2 9 2009-10-12 20:33:52

解决方案3 4 2009-10-12 20:36:41

解决方案4 3 2009-10-12 20:36:10

解决方案1
10 已采纳 2009-10-12 21:20:26

解决方案2
9 2009-10-12 20:33:52

解决方案3
4 2009-10-12 20:36:41

解决方案4
3 2009-10-12 20:36:10