使用REGEXP_EXTRACT获取域和子域

Question

I have only managed to extract the TLD of the list of websites that I have using 我只是设法提取我使用的网站列表的TLD

REGEXP_EXTRACT(Domain_name, r'(\.[^.:]*)]\.?:?[0-9]*$') AS web_tld

Example: 例：

I have 我有

www.example1.abc.com
www.example2.efg.123.net

I want the result 我想要结果

Subdomain 子域

example1
efg

Domain 域

abc
123

TLD TLD

.com
.net

EDIT: Encountered an error in my query 'Exactly one capturing group must be specified' when I use (.?([^.:]+).([^.:]+).([^.:]+):?[0-9]*$) as regex 编辑：在我的查询中遇到错误'当我使用时，必须指定一个捕获组'（。？（[^：] +）。（[^：] +）。（[^：] +）：？[0-9] * $）作为正则表达式

SELECT
REGEXP_EXTRACT(Domain, r'(\.?([^.:]+)\.([^.:]+)\.([^.:]+):?[0-9]*$)'),
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;

Answer 1

As you can only use one capturing group, I think you can actually use 3 separate regular expressions to get the values you want: 由于您只能使用一个捕获组，我认为您实际上可以使用3个单独的正则表达式来获取所需的值：

SELECT
REGEXP_EXTRACT(Domain, r'([^.:]+):?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+:?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+.[^.:]+:?[0-9]*$')
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;

Answer 2

请注意，使用HOST，DOMAIN和TLD而不是自定义正则表达式可能会更好。

使用REGEXP_EXTRACT获取域和子域

问题描述

2 个解决方案

解决方案1
9 已采纳 2014-01-22 03:09:08

解决方案2
5 2014-01-22 16:12:16

使用REGEXP_EXTRACT获取域和子域

问题描述

2 个解决方案

解决方案1 9 已采纳 2014-01-22 03:09:08

解决方案2 5 2014-01-22 16:12:16

解决方案1
9 已采纳 2014-01-22 03:09:08

解决方案2
5 2014-01-22 16:12:16