如何从url中提取域名？

Question

如何使用 bash 从 url 中提取域名？ 像： http://example.com/到 example.com 必须适用于任何顶级域名，而不仅仅是 com

Answer 1

您可以使用简单的 AWK 方式提取域名，如下所示：

echo http://example.com/index.php | awk -F[/:] '{print $4}'

输出： example.com

:-)

Answer 2

$ URI="http://user:pw@example.com:80/"
$ echo $URI | sed -e 's/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/'
example.com

见http://en.wikipedia.org/wiki/URI_scheme

Answer 3

basename "http://example.com"

当然，这不适用于这样的 URI： http://www.example.com/index.html : http://www.example.com/index.html但您可以执行以下操作：

basename $(dirname "http://www.example.com/index.html")

或者对于更复杂的 URI：

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d 表示“分隔符”，-f 表示“字段”； 在上面的示例中，由正斜杠“/”分隔的第三个字段是 www.example.com。

Answer 4

echo $URL | cut -d'/' -f3 | cut -d':' -f1

适用于 URL：

http://host.example.com
http://host.example.com/hi/there
http://host.example.com:2345/hi/there
http://host.example.com:2345

Answer 5

sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_'

例如

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'https://example.com'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment'
example.com

$ sed -E -e 's_.*://([^/@]*@)?([^/:]+).*_\2_' <<< 'http://user:pass@example.com:1234/some/path#fragment?params=true'
example.com

Answer 6

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

用法：

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

如果您只想要域而不是完整的主机 + 域，请改用它：

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

Answer 7

您可以使用 python 的 urlparse，而不是使用正则表达式来执行此操作：

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

您可以像这样使用它，也可以将它放在一个小脚本中。 然而，这仍然需要一个有效的方案标识符，查看您的评论，您的输入不一定提供一个。 您可以指定默认方案，但 urlparse 期望 netloc 以'//'开头：

url = urlparse('//www.example.com/index.html','http')

因此，您必须手动添加这些内容，即：

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"

Answer 8

关于如何获取这些 url 的信息太少了……请下次显示更多信息。 url 中是否有参数等...同时，只需对示例 url 进行简单的字符串操作

例如

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

其他方式，使用 sed(GNU)

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

使用 awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

Answer 9

以下将输出“example.com”：

URI="http://user@example.com/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

有关您可以使用 Ruby 的 URI 类做什么的更多信息，您必须查阅文档。

Answer 10

一种涵盖更多情况的解决方案将基于 sed regexp：

echo http://example.com/index.php | sed -e 's#^https://\\|^http://##' -e 's#:.*##' -e 's#/.*##'

这适用于以下 URL： http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php : http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php : http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php : http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php : http://example.com/index.php, http://example.com:4040/index.php, https://example.com/index.php

Answer 11

使用 Ruby，您可以使用 Domainatrix 库/gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2'
url = Domainatrix.parse(s)
url.domain
=> "kku"

很棒的工具！ :-)

Answer 12

这是 node.js 的方式，它可以使用或不使用端口和深度路径：

//get-hostname.js
'use strict';

const url = require('url');
const parts = url.parse(process.argv[2]);

console.log(parts.hostname);

可以这样调用：

node get-hostname.js http://foo.example.com:8080/test/1/2/3.html
//foo.example.com

文档： https : //nodejs.org/api/url.html

Answer 13

没有任何子 shell 或子进程的纯 Bash 实现：

# Extract host from an URL
#   $1: URL
function extractHost {
    local s="$1"
    s="${s/#*:\/\/}" # Parameter Expansion & Pattern Matching
    echo -n "${s/%+(:*|\/*)}"
}

例如extractHost "docker://1.2.3.4:1234/a/v/c"将 output 1.2.3.4

Answer 14

请注意，仅从 URL 中提取域名有点棘手，因为域名在主机名中的位置取决于所使用的国家（或更普遍的 TLD）。

例如。 对于阿根廷： www.personal.com.ar域名是 personal.com.ar，而不是 com.ar，因为此 TLD 使用子区域来指定组织类型。

我发现可以很好地管理这些案例的工具是tldextract

因此，基于 FQDN（URL 的主机部分），您可以通过以下方式可靠地获取域：tldextract personal.com.ar | 剪切-d“”-f 2,3 | sed 's/ /./'

（上面从 URL 中获取 FQDN 的其他答案很好，应该使用）

希望这有助于解决极端情况

Answer 15

关于问题的备注：

问题代表regex ，但目标是在/字符上拆分字符串。 对这种工作使用正则表达式是矫枉过正！

**使用 bash `read` url 部分**

由于这个问题被标记为bash并且没有答案地址read简短而快速的解决方案：

URL="http://example.com/some/path/to/page.html"

IFS=/ read -r prot _ domain link <<<"$URL"

就这样。 由于 read 是内置的，这是最快的方法！！

从那里你可以

printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" Link "/$link"
Protocol: http
Domain  : example.com
Link    : /some/path/to/page.html

你甚至可以检查端口：

URL="http://example.com:8000/some/path/to/page.html"
IFS=/ read -r prot _ domain link <<<"$URL"
IFS=: read -r domain port <<<"$domain"

printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" Port "$port" Link "/$link"
Protocol: http
Domain  : example.com
Port    : 8000
Link    : /some/path/to/page.html

使用默认端口进行完整解析：

URL="https://stackoverflow.com/questions/2497215/how-to-extract-domain-name-from-url"
declare -A DEFPORTS='([http]=80 [https]=443 [ipp]=631 [ftp]=21)'
IFS=/ read -r prot _ domain link <<<"$URL"
IFS=: read -r domain port <<<"$domain"

printf '%-8s: %s\n' Protocol "${prot%:}" Domain "$domain" \
    Port  "${port:-${DEFPORTS[${prot%:}]}}" Link "/$link"
Protocol: https
Domain  : stackoverflow.com
Port    : 443
Link    : /questions/2497215/how-to-extract-domain-name-from-url

如何从url中提取域名？

问题描述

15 个解决方案

解决方案1
87 2012-07-08 18:50:29

解决方案2
27 2010-03-24 09:52:18

解决方案3
26 2010-03-29 19:34:05

解决方案4
14 2016-05-10 14:02:15

解决方案5
8 2017-05-24 08:23:17

解决方案6
7 2010-03-23 03:47:57

解决方案7
6 2010-03-23 10:31:20

解决方案8
4 2010-03-23 02:43:03

解决方案9
3 2010-03-24 09:26:06

解决方案10
1

解决方案11
0 2010-04-22 00:28:05

解决方案12
0 2015-12-29 03:45:03

解决方案13
0 2022-04-02 04:26:28

解决方案14
0 2022-12-28 18:03:35

解决方案15
0 2022-12-29 07:20:07

关于问题的备注：

**使用 bash `read` url 部分**

使用默认端口进行完整解析：

如何从url中提取域名？

问题描述

15 个解决方案

解决方案1 87 2012-07-08 18:50:29

解决方案2 27 2010-03-24 09:52:18

解决方案3 26 2010-03-29 19:34:05

解决方案4 14 2016-05-10 14:02:15

解决方案5 8 2017-05-24 08:23:17

解决方案6 7 2010-03-23 03:47:57

解决方案7 6 2010-03-23 10:31:20

解决方案8 4 2010-03-23 02:43:03

解决方案9 3 2010-03-24 09:26:06

解决方案10 1

解决方案11 0 2010-04-22 00:28:05

解决方案12 0 2015-12-29 03:45:03

解决方案13 0 2022-04-02 04:26:28

解决方案14 0 2022-12-28 18:03:35

解决方案15 0 2022-12-29 07:20:07

关于问题的备注：

使用 bash read url 部分

使用默认端口进行完整解析：

解决方案1
87 2012-07-08 18:50:29

解决方案2
27 2010-03-24 09:52:18

解决方案3
26 2010-03-29 19:34:05

解决方案4
14 2016-05-10 14:02:15

解决方案5
8 2017-05-24 08:23:17

解决方案6
7 2010-03-23 03:47:57

解决方案7
6 2010-03-23 10:31:20

解决方案8
4 2010-03-23 02:43:03

解决方案9
3 2010-03-24 09:26:06

解决方案10
1

解决方案11
0 2010-04-22 00:28:05

解决方案12
0 2015-12-29 03:45:03

解决方案13
0 2022-04-02 04:26:28

解决方案14
0 2022-12-28 18:03:35

解决方案15
0 2022-12-29 07:20:07

**使用 bash `read` url 部分**