简体   繁体   中英

How can I get the host name from a URL in Perl?

I have a URL like "www.google.com/aabc/xyz". How can I get host name from this? I used this code:

 my $referer = URI->new('www.google.com/aabc/xyz');
 my $host    = $referer->host; //compiler error

I'm getting error at the second line.

use URI;
use URI::Heuristic qw(uf_uristr);

my $referrer = URI->new( uf_uristr('www.google.com/aabc/xyz') );
print $referrer->host;

The question changed significantly since my first answer, which I've deleted. With high enough rep you can see it.

You have in the code (it's better to post complete programs):

my $referer = URI->new('www.google.com/aabc/xyz');
my $host    = $referer->host; //compiler error

You say that you're getting a compiler error, but it's really a runtime error:

Can't locate object method "host" via package "URI::_generic"

When you made the new object, you gave URI a string. From that, it's going to guess what sort of URI it is. Since there's no scheme, such as http:// , in front of it, it doesn't guess that it's that sort of URI. Instead, it falls back to a "generic" class URI::_generic . By the underscore in its name and the fact there's no documentation for it, you may surmise it's not meant for you to know about.

But, here it is complaining. It thinks the URI is a path (and some other things). The part you recognize as the host it parses as a path:

use v5.10;

use URI;

my $referer = URI->new('www.google.com/aabc/xyz');
my $path    = $referer->path;

say "path is $path";

Now you see what it did:

 path is www.google.com/aabc/xyz

The generic URI doesn't know anything about a host, so when you call host on its object, it blows up. It would be nicer for it to return undef, perhaps, but that's not what it does.

oanders already has an interesting answer that guesses for you to fill in schemes when it thinks they might be missing, but there's another thing you can do. Before you call host, check that the object can respond to it:

use v5.10;

use URI;

my $url = 'www.google.com/aabc/xyz';
my $referer = URI->new( $url );

if( $referer->can( 'host' ) ) {
    say "Host is " . $referer->host;
    }
else {
    say "Weird hostless URL: $referer";
    }

Now your program shouldn't blow up for the same reason and you can look at the output to discover strings that you couldn't process.

$ echo -e "http://www.google.www.com/abc/xyz\nhttps://google.com\nwww.google.www.com"
http://www.google.www.com/abc/xyz
https://google.com
www.google.www.com

$ echo -e "http://www.google.www.com/abc/xyz\nhttps://google.com\nwww.google.www.com" | perl -pe "s/^(http(s)?:\/\/)?(www\.)?//"
google.www.com/abc/xyz
google.com
google.www.com

You can do it much simpler than above.

CODE

use strict;
use warnings;

while (<DATA>) {
     $_ =~ s/^(https?:\/\/)?(www.)?\b//;
     print $_ ;
}

__DATA__
http://www.google.com/abc/xyz
https://google.com
www.google.com

Results

google.com/abc/xyz

google.com

google.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM