如何使用Perl的LWP :: UserAgent获取具有不同查询字符串的相同URL？

Question

I looked up articles about using LWP however I am still lost! 我查阅了有关使用LWP的文章，但我仍然迷路！ On this site we find a list of many schools; 在此站点上，我们找到了许多学校的列表。 see the overview-page and follow some of the links and get some result pages: 请参阅概述页面，并单击一些链接并获得一些结果页面：

I want to parse the sites using LWP::UserAgent and for the parsing : want to use either HTML::TreeBuilder::XPath or HTML::TokeParser 我想使用LWP :: UserAgent解析站点，并进行解析：想使用HTML :: TreeBuilder :: XPath或HTML :: TokeParser

At the moment I am musing bout choosing the right get-request! 此刻，我在想选择正确的获取请求！ I have some issues with the LWP::Useragent. 我对LWP :: Useragent有一些问题。 The subsite of the overview can be reached via direct links. 概述的子站点可以通过直接链接访问。 but -note: each site has content. 但是-note：每个站点都有内容。 eg the following URLs of the above mentioned result-pages. 例如上述结果页的以下URL。

As a Novice here I cannot show you the endings of the different endings by posting the full URL but here you can see the endings: 作为新手，我无法通过发布完整的URL向您显示不同结尾的结尾，但是在这里您可以看到结尾：

id=21&extern_eid=709
id=21&extern_eid=789
id=21&extern_eid=1297
id=21&extern_eid=761

There are many different URLS that differ in the end of the URL. URL末尾有许多不同的URL。 The question is : how to i run LWP::UserAgent? 问题是：如何运行LWP :: UserAgent？ I want fetch and parse & ** all the - 1000 sites.** 我想要提取并解析＆**所有-1000个网站。**

Question; 题; Does LWP do the job automatically!? LWP是否会自动执行工作！？ Or do i have to set up LWP :: UserAgent that it will look up the different URLS automatically... 还是我必须设置LWP :: UserAgent，它会自动查找不同的URL ...

Solutions: Perhaps we have to count up form zero to 10000 with there 解决方案：也许我们必须在那里从零开始计数到10000

extern_eid=709 -(count from zero to 100000) here extern_eid = 709-（从零到100000的计数）在这里

www-db.sn.schule.de/index.php?id=21&extern_eid=709 www-db.sn.schule.de/index.php?id=21&extern_eid=709

BTW: Here the data for LWP User Agent; BTW：这里是LWP用户代理的数据；

REQUEST METHODS The methods described in this section are used to dispatch requests via the user agent. 请求方法本节中描述的方法用于通过用户代理调度请求。 The following request methods are provided: 提供了以下请求方法：

$ua->get( $url ) $ua->get( $url , $field_name => $value, ... ) $ ua-> get（$ url）$ ua-> get（$ url，$ field_name => $ value，...）

This method will dispatch a GET request on the given $url. 此方法将在给定的$ url上调度GET请求。 Further arguments can be given to initialize the headers of the request. 可以提供其他参数来初始化请求的标头。 These are given as separate name/value pairs. 这些作为单独的名称/值对给出。 The return value is a response object. 返回值是一个响应对象。 See HTTP::Response for a description of the interface it provides. 有关其提供的接口的说明，请参见HTTP :: Response。 There will still be a response object returned when LWP can't connect to the server specified in the URL or when other failures in protocol handlers occur. 当LWP无法连接到URL中指定的服务器或协议处理程序中发生其他故障时，仍将返回一个响应对象。

The question is: How to use LWP::UserAgent on the above mentioned site the right way - effectively!? 问题是：如何在上述站点上正确使用LWP :: UserAgent-有效！

I look forward to any and all help! 我期待任何帮助！

Answer 1

If I understand your question correctly, you are trying to use LWP::UserAgent on same URLs with different query arguments, and you are wondering if LWP::UserAgent provides a way for you to loop through the query arguments? 如果我正确理解了您的问题，那么您正在尝试在具有不同查询参数的相同URL上使用LWP :: UserAgent，并且您想知道LWP :: UserAgent是否为您提供了一种遍历查询参数的方法？

I don't think LWP::UserAgent has a method for you to do that. 我不认为LWP :: UserAgent可以让您做到这一点。 However, you can have a loop constructing the URLs and use LWP::UserAgent repeatedly: 但是，您可以循环构建URL并重复使用LWP :: UserAgent：

for my $id (0 .. 100000)
{
    $ua->get($url."?id=21&extern_eid=".(709-$id))
    //rest of the code
}

Alternatively you can add a request_prepare handler that computes and add the query arguments before you send out the request. 或者，您可以添加一个request_prepare处理程序，该处理程序在发出请求之前进行计算并添加查询参数。

Answer 2

You describe following links for the purpose of web scraping. 您出于网络抓取目的描述了以下链接。 The LWP subclass WWW::Mechanize does this more easily than your current attempt. LWP子类WWW::Mechanize比您当前的尝试更容易做到这一点。

如何使用Perl的LWP :: UserAgent获取具有不同查询字符串的相同URL？

问题描述

2 个解决方案

解决方案1
0 2010-10-22 23:11:22

解决方案2
0 2010-10-23 14:22:45

如何使用Perl的LWP :: UserAgent获取具有不同查询字符串的相同URL？

问题描述

2 个解决方案

解决方案1 0 2010-10-22 23:11:22

解决方案2 0 2010-10-23 14:22:45

解决方案1
0 2010-10-22 23:11:22

解决方案2
0 2010-10-23 14:22:45