简体   繁体   English

通过 URL 每页获得更多搜索结果

[英]Getting more search results per page via URL

I've been writing a program which extracts data from web searches.我一直在编写一个从网络搜索中提取数据的程序。 To get more data, I'd ideally like to extract more results per query through a script (let's say 100 or so).为了获得更多数据,我理想情况下希望通过脚本为每个查询提取更多结果(比如 100 个左右)。

My question is, is there a way to modify the URL for Google, Yahoo, or Bing (preference in that order) so that I can get more than 10 results per query?我的问题是,有没有办法修改 Google、Yahoo 或 Bing 的 URL(按优先顺序排列),以便每次查询可以获得 10 个以上的结果?

For Google, appending &num=99 used to work at one point but no longer works :( I saw a similar append of &count=50 but that didn't work on any of the search engines either.对于 Google,附加&num=99曾经一度起作用但不再起作用:(我看到了类似的&count=50附加,但这在任何搜索引擎上都不起作用。

The reason num=99 doesn't work for Google is because the num parameter's actual value isn't used, but is instead compared to a list of allowed values. num=99对 Google 不起作用的原因是因为未使用num参数的实际值,而是将其与允许值列表进行比较。

The allowed values are 10, 20, 30, 40, 50, and 100 .允许的值为10、20、30、40、50 和 100 Any other values for this field are ignored.此字段的任何其他值都将被忽略。

For Bing, the parameter is count=## where ## can be anything from 1-100.对于 Bing,参数是count=## ,其中 ## 可以是 1-100 之间的任何值。

For Yahoo, the parameter is n=## where ## can be anything from 1-100.对于雅虎,参数是n=## ,其中 ## 可以是 1-100 之间的任何值。

In most cases, the URL parameter will only work if the user hasn't specified the number of search results to show in the search engine's search settings.在大多数情况下,URL 参数仅在用户未指定要在搜索引擎的搜索设置中显示的搜索结果数量时才起作用。 Otherwise, that cookie will take precedence.否则,该 cookie 将优先。

I don't know what programming language you're using, but the general idea is to load the google search page with the proper cookie settings (that is how they are stored at the time of this writing).我不知道您使用的是哪种编程语言,但总体思路是使用正确的 cookie 设置加载 google 搜索页面(这是撰写本文时的存储方式)。

You can set and then view cookies in Google Chrome.您可以在 Google Chrome 中设置然后查看 cookie。 To avoid unnecessary cookies, start by opening a new incognito window ( Ctrl + Shift + N ), and navigating to the search settings ( https://www.google.com/preferences ).为避免不必要的 cookie,首先打开一个新的隐身窗口 ( Ctrl + Shift + N ),然后导航到搜索设置 ( https://www.google.com/preferences )。

At the time of writing, you will want to check "Never show instant results", and then adjust the slider of "Results per page" to whatever value you want.在撰写本文时,您需要选中“从不显示即时结果”,然后将“每页结果”的滑块调整为您想要的任何值。 After hitting "Save" at the bottom, you can now view your cookies by opening the developer console ( Ctrl + Shift + J ), and navigating to the resource tab.点击底部的“保存”后,您现在可以通过打开开发者控制台( Ctrl + Shift + J )并导航到资源选项卡来查看您的 cookie。

Again, at the time of writing, Google sets two variables, NID and PREF .同样,在撰写本文时,Google 设置了两个变量, NIDPREF PREF is the one we're interested in to get the search results to change. PREF是我们对改变搜索结果感兴趣的那个。 An example of what it may look like:它可能是什么样子的一个例子:

ID=8155cce71859f7d0:U=fe6e69e174148b7b:FF=0:LD=en:NR=40:TM=1379366492:LM=1379366586:SG=2:S=FoybwBhek8noyp0t

(This key fetches 40 results as indicated by NR=40 ) (此键获取 40 个结果,如NR=40

With this key ( PREF ) and value for it (as seen above), you can send the cookie when requesting a page via , , etc. In my most recent project related to this, I was using with the requests library .有了这个键( PREF )和它的值(如上所示),您可以在通过等请求页面时发送 cookie。在我最近的与此相关的项目中,我使用requests 库

Here is a snippet on how you may go about fetching a Google page with 40 results (modified example from the requests documentation):以下是关于如何获取具有 40 个结果的 Google 页面的片段(请求文档中的修改示例):

var j = request.jar();
var cookie = request.cookie('PREF=ID=8155cce71859f7d0:U=fe6e69e174148b7b:FF=0:LD=en:NR=40:TM=1379366492:LM=1379366586:SG=2:S=FoybwBhek8noyp0t');
j.add(cookie);
request({url: 'https://www.google.com/search', jar: j}, 
function(error, response, body) {
    // do something with the body (html) of the page! 
});

Or take a look at the man pages for wget / curl.或者查看 wget / curl 的手册页。 I know that wget specifies a --load-cookies flag that you can use.我知道 wget 指定了一个您可以使用的--load-cookies标志。

You can apply this to any other cookie-based website that you need content from.您可以将其应用于您需要内容的任何其他基于 cookie 的网站。 Yahoo!雅虎! uses cookie based settings - I'm not sure what Bing uses.使用基于 cookie 的设置 - 我不确定 Bing 使用什么。

Add &n=100 to links.&n=100添加到链接。 Get page with 100 results获取包含 100 个结果的页面

https://www.google.com/search?q=who+is+google&num=100

您仍然可以使用 URL 中的 num 参数来设置每页要获取的结果数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM