Python正则表达式从文本中提取域

Question

I have the following regex: 我有以下正则表达式：

r'(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}'

When I apply this to a text string with, let's say, "this is www.website1.com and this is website2.com", I get: 当我将其应用于带有“这是www.website1.com和这是website2.com”的文本字符串时，我得到：

['www.website1.com']

['website.com']

How can i modify the regex to exclude the 'www ', so that I get 'website1.com' and 'website2.com ? 如何修改正则表达式以排除'www ”，以便获得'website1.com'和'website2.com ？ I'm missing something pretty basic ... 我缺少一些非常基本的东西...

Answer 1

Try this one (thanks @SunDeep for the update): 试试这个（感谢@SunDeep提供更新）：

\s(?:www.)?(\w+.com)

Explanation 说明

\\s matches any whitespace character \\s匹配任何空格字符

(?:www.)? non-capturing group, matches www. 非捕获组，匹配www. 0 or more times 0次以上

(\\w+.com) matches any word character one or more times, followed by .com (\\w+.com)一次或多次匹配任何单词字符，后跟.com

And in action: 并采取行动：

import re

s = 'this is www.website1.com and this is website2.com'

matches = re.findall(r'\s(?:www.)?(\w+.com)', s)
print(matches)

Output: 输出：

['website1.com', 'website2.com']

A couple notes about this. 关于此的一些注意事项。 First of all, matching all valid domain names is very difficult to do, so while I chose to use \\w+ to capture for this example, I could have chosen something like: [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\\.[a-zA-Z]{2,} . 首先，很难匹配所有有效域名，因此，在本例中，我选择使用\\w+进行捕获，但我可以选择类似以下内容： [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\\.[a-zA-Z]{2,} 。

This answer has a lot of helpful info about matching domains: What is a regular expression which will match a valid domain name without a subdomain? 这个答案有很多有关匹配域的有用信息：什么是正则表达式，它将匹配没有子域的有效域名？

Next, I only look for .com domains, you could adjust my regular expression to something like: 接下来，我只查找.com域，您可以将正则表达式调整为以下形式：

\s(?:www.)?(\w+.(com|org|net))

To match whichever types of domains you were looking for. 匹配您要查找的任何类型的域。

Answer 2

Here a try : 这里尝试：

import re
s = "www.website1.com"
k = re.findall ( '(www.)?(.*?)$', s, re.DOTALL)[0][1]
print(k)

O/P like : O / P像：

'website1.com'

if it is s = "website1.com" also it will o/p like : 如果它是s = "website1.com"它也会像下面这样：

'website1.com'

Python正则表达式从文本中提取域

问题描述

2 个解决方案

解决方案1
3 2018-03-08 06:17:19

解决方案2
0 2018-03-08 06:19:49

Python正则表达式从文本中提取域

问题描述

2 个解决方案

解决方案1 3 2018-03-08 06:17:19

解决方案2 0 2018-03-08 06:19:49

解决方案1
3 2018-03-08 06:17:19

解决方案2
0 2018-03-08 06:19:49