Perl WWW :: Mechanize比較不同網址的響應頭內容長度

Question

我有一個問題，我希望你可以幫忙嗎？

我有兩個包含以下內容的文本文件：

FILE1.TXT

http://www.dog.com/
http://www.cat.com/
http://www.antelope.com/

FILE2.TXT

1
2
Barry

我正確實現的輸出如下：

http://www.dog.com/1
http://www.dog.com/2
http://www.dog.com/Barry 
http://www.cat.com/1
http://www.cat.com/2
http://www.cat.com/Barry
http://www.antelope.com/1 
http://www.antelope.com/2
http://www.antelope.com/Barry

代碼來做上面的事情

    open my $animalUrls, '<', 'FILE1.txt' or die "Can't open: $!";
    open my $directory, '<', 'FILE2.txt' or die "Can't open: $!";

    my @directory = <$directory>;   #each line of the file into an array
    close $directory or die "Can't close: $!";

    while (my $line = <$animalUrls>) {
    chomp $line;
    print $line.$_ foreach (@directory);
    push (@newListOfUrls, $line.$_) foreach (@directory);  #put each new url into array
    }

現在我遇到的問題是：

我需要獲得原始網址（FILE1.TXT）的內容長度，並比較各與相應的原始一個新的URL的內容長度，看看他們是相同的或不同的，例如：

我需要將http://www.dog.com/1的Content-Length與原始URL http://www.dog.com/的Content-Length進行比較，看看是否存在差異。
然后我需要將http://www.dog.com/2的Content-Length與原始URL http://www.dog.com/的Content-Length進行比較，看看是否存在差異。
然后我需要將http://www.dog.com/Barry的Content-Length與原始URL http://www.dog.com/的Content-Length進行比較，看看是否存在差異。
然后我需要將http://www.cat.com/1的Content-Length與原始網址http://www.cat.com/的Content-Length進行比較，看看是否存在差異。
然后我需要將http://www.cat.com/2的內容長度與原始網址http://www.cat.com/的內容長度進行比較，看看是否存在差異。
等等........

獲取Content-Length的代碼：

print $mech->response->header('Content-Length');  #returns the content length

我遇到的問題是如何將每個新網址與正確對應的原始網址進行比較？ （即不小心比較的內容長度http://www.cat.com/Barry用的內容長度http://www.dog.com/ ）也許我應該使用哈希，我會怎樣那？

非常感謝您對此的幫助

Answer 1

你應該使用哈希。 我會更改您的輸入代碼以創建更復雜的數據結構，因為這會使任務更容易。

open my $animalUrls, '<', 'FILE1.txt' or die "Can't open: $!";
open my $directory, '<', 'FILE2.txt' or die "Can't open: $!";

my @directory = <$directory>;   #each line of the file into an array
close $directory or die "Can't close: $!";
my $newURLs;

while ( my $baseURL = <$animalUrls> ) {
  chomp $baseURL;

  SUBDIR: foreach my $subdir (@directory) {
    chomp $subdir;
    next SUBDIR if $subdir eq "";
    # put each new url into arrayref
    push( @{ $newURLs->{$baseURL} }, $baseURL . $subdir );
  }
}

我們現在可以利用這個優勢。 假設我們已經設置了Mechanize：

foreach my $url ( keys %{$newURLs} ) {
  # first get the base URL and save its content length
  $mech->get($url);
  my $content_length = $mech->response->header('Content-Length');

  # now iterate all the 'child' URLs
  foreach my $child_url ( @{ $newURLs->{$url} } ) {
    # get the content
    $mech->get($child_url);

    # compare
    if ( $mech->response->header('Content-Length') != $content_length ) {
      print "$child_url: different content length: $content_length vs "
        . $mech->response->header('Content-Length') . "!\n";
    }
  }
}

你甚至可以在沒有第二組foreach循環的情況下通過將代碼放在構建數據結構的位置來實現。

如果您不熟悉這些參考文獻，請查看perlreftut 。 我們在這里做的是為每個基本URL創建一個帶有密鑰的哈希，並將所有生成的子URL的數組放入其中。 如果使用Data :: Dumper輸出最終的$newURLs ，它將如下所示：

$VAR1 = {
  'http://www.dog.com/' => [
    'http://www.dog.com/1',
    'http://www.dog.com/2',
   ],
  'http://www.cat.com/' => [
    'http://www.cat.com/1',
    'http://www.cat.com/2',
   ],
};

編輯：我更新了代碼。 我使用這些文件來測試它：

URLS：

http://www.stackoverflow.com/ 
http://www.superuser.com/

DIRS：

faq
questions
/

Answer 2

這段代碼似乎可以滿足您的需求。 它將所有URL存儲在@urls並在獲取每個URL時打印內容長度。 我不知道你之后需要什么長度數據，但我已經將每個響應的長度存儲在散列%lengths以將它們與URL相關聯。

use 5.010;
use warnings;

use LWP::UserAgent;

STDOUT->autoflush;

my @urls;

open my $fh, '<', 'FILE1.txt' or die $!;
while (my $base = <$fh>) {
  chomp $base;
  push @urls, $base;
  open my $fh, '<', 'FILE2.txt' or die $!;
  while (my $path = <$fh>) {
    chomp $path;
    push @urls, $base.$path;
  }
}

my $ua = LWP::UserAgent->new;

my %lengths;

for my $url (@urls) {
  my $resp = $ua->get($url);
  my $length = $resp->header('Content-Length');
  $lengths{$url} = $length;

  printf "%s  --  %s\n", $url, $length // 'undef';
}

產量

http://www.dog.com/  --  undef
http://www.dog.com/1  --  56244
http://www.dog.com/2  --  56244
http://www.dog.com/Barry  --  56249
http://www.cat.com/  --  156
http://www.cat.com/1  --  11088
http://www.cat.com/2  --  11088
http://www.cat.com/Barry  --  11088
http://www.antelope.com/  --  undef
http://www.antelope.com/1  --  undef
http://www.antelope.com/2  --  undef
http://www.antelope.com/Barry  --  undef

Perl WWW :: Mechanize比較不同網址的響應頭內容長度

問題描述

2 個解決方案

解決方案1
3 已采納 2013-02-07 11:47:19

解決方案2
1 2013-02-07 12:50:03

Perl WWW :: Mechanize比較不同網址的響應頭內容長度

問題描述

2 個解決方案

解決方案1 3 已采納 2013-02-07 11:47:19

解決方案2 1 2013-02-07 12:50:03

解決方案1
3 已采納 2013-02-07 11:47:19

解決方案2
1 2013-02-07 12:50:03