从“ Yahoo!”运行perl hack Google中的目录Mindshare”

Question

Has anyone run perl script given at http://oreilly.com/pub/h/974#code ? 有人运行过http://oreilly.com/pub/h/974#code提供的 perl脚本吗？

This is a famous one, used to get URLs from Yahoo! 这是一个著名的网址，用于从Yahoo!获取URL。 directory and many people have successfully used it. 目录，很多人已经成功使用它。

I was trying to get URLs. 我正在尝试获取URL。 I created my own Google API key and replaced that in the code. 我创建了自己的Google API密钥，并将其替换为代码。 Apart from that I did not make any change. 除此之外，我没有做任何改变。

Script is neither producing any error nor any URL. 脚本既不会产生任何错误，也不会产生任何URL。

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
use HTML::LinkExtor;
use SOAP::Lite;

my $google_key  = "your API key goes here";
my $google_wdsl = "GoogleSearch.wsdl";
my $yahoo_dir   = shift || "/Computers_and_Internet/Data_Formats/XML_  _".
              "eXtensible_Markup_Language_/RSS/News_Aggregators/";

# download the Yahoo! directory.
my $data = get("http://dir.yahoo.com" . $yahoo_dir) or die $!;

# create our Google object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %urls; # where we keep our counts and titles.

# extract all the links and parse 'em.
HTML::LinkExtor->new(\&mindshare)->parse($data);

sub mindshare { # for each link we find...

  my ($tag, %attr) = @_;

  print "$tag\n";   

  # continue on only if the tag was a link,

  # and the URL matches Yahoo!'s redirectory.

  return if $tag ne 'a';   

  return unless $attr{href} =~ /srd.yahoo/;

  return unless $attr{href} =~ /\*http/;



  # now get our real URL.

  $attr{href} =~ /\*(http.*)/; my $url = $1;

  print "hi";

  # and process each URL through Google.

  my $results = $google_search->doGoogleSearch(

                      $google_key,"link:$url", 0, 1,

                      "true", "", "false", "", "", ""

                ); # wheee, that was easy, guvner.

  $urls{$url} = $results->{estimatedTotalResultsCount};

  print "1\n";

} 

# now sort and display.

my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;

foreach my $url (@sorted_urls) { print "$urls{$url}: $url\n"; }

Program goes into the loop, and comes out at first iteration to "my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;". 程序进入循环，并在第一次迭代时出现“我的@sorted_urls = sort {$ urls {$ b} <=> $ urls {$ a}}键％urls;”。

I don't have any understanding about perl but this task should have been trivial. 我对perl没有任何了解，但这项任务本来应该是微不足道的。

Surely,I am missing something very obvious, because this script has been successfully used by many. 当然，我缺少一些非常明显的东西，因为该脚本已被许多人成功使用。

Thanks in advance. 提前致谢。

Answer 1

Are you supplying a directory to the script? 您是否在向脚本提供目录？ Because if you are not, and this line in your script 因为如果不是，那么脚本中的这一行

"/Computers_and_Internet/Data_Formats/XML_  _".
              "eXtensible_Markup_Language_/RSS/News_Aggregators/"

is not a formatting artefact, then you're trying to scrape a non-existent page. 不是格式化文物，则您要尝试抓取不存在的页面。

从“ Yahoo!”运行perl hack Google中的目录Mindshare”

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-02-08 12:18:33

从“ Yahoo!”运行perl hack Google中的目录Mindshare”

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-02-08 12:18:33

解决方案1
1 已采纳 2012-02-08 12:18:33