正则表达式检查给定的字符串是否是相对 URL

Question

First, I have read this question about how to check if string is an absolute or relative URL.首先，我已经阅读了这个问题有关如何检查是否字符串是一个绝对或相对URL。 My problem is I need a regex to check if a given string is a relative URL or not, ie I need a regex to check if a string does not start with any protocol or double slash // .我的问题是我需要一个正则表达式来检查给定的字符串是否是相对URL，即我需要一个正则表达式来检查字符串是否不以任何协议或双斜杠//开头。

Actually, I am doing web scraping with Beautiful Soup and I want to retrieve all relative links.实际上，我正在使用Beautiful Soup进行网页抓取，我想检索所有相关链接。 Beautiful Soup uses this syntax: Beautiful Soup使用以下语法：

soup.findAll(href=re.compile(REGEX_TO_MATCH_RELATIVE_URL))

So, that's why I need this.所以，这就是为什么我需要这个。

Test cases are测试用例是

about.html
tutorial1/
tutorial1/2.html
/
/experts/   
../ 
../experts/ 
../../../   
./  
./about.html

Thank you so much.非常感谢。

Answer 1

Since you find it helpful, I am posting my suggestion.既然你觉得它有帮助，我就把我的建议贴出来。

The regular expression can be:正则表达式可以是：

^(?!www\.|(?:http|ftp)s?://|[A-Za-z]:\\|//).*

See demo看演示

Note that it is becoming more and more unreadable if you start adding exclusions or more alternatives.请注意，如果您开始添加排除项或更多替代项，它会变得越来越不可读。 Thus, perhaps, use VERBOSE mode (declared with re.X ):因此，也许，使用 VERBOSE 模式（用re.X声明）：

import re
p = re.compile(r"""^                    # At the start of the string, ...
                   (?!                  # check if next characters are not...
                      www\.             # URLs starting with www.
                     |
                      (?:http|ftp)s?:// # URLs starting with http, https, ftp, ftps
                     |
                      [A-Za-z]:\\       # Local full paths starting with [drive_letter]:\  
                     |
                      //                # UNC locations starting with //
                   )                    # End of look-ahead check
                   .*                   # Martch up to the end of string""", re.X)
print(p.search("./about.html"));          # => There is a match
print(p.search("//dub-server1/mynode"));  # => No match

See IDEONE demo看IDEONE 演示

The other Washington Guedes's regexes其他华盛顿 Guedes 的正则表达式

^([a-z0-9]*:|.{0})\\/\\/.*$ - matches ^([a-z0-9]*:|.{0})\\/\\/.*$ - 匹配
- ^ - beginning of the string ^ - 字符串的开头
- ([a-z0-9]*:|.{0}) - 2 alternatives: ([a-z0-9]*:|.{0}) - 2 种选择：
- [a-z0-9]*: - 0 or more letters or digits followed with : [a-z0-9]*: - 0 个或多个字母或数字后跟:
- .{0} - an empty string .{0} - 空字符串
- \\/\\/.* - // and 0 or more characters other than newline (note you do not need to escape / in Python) \\/\\/.* - //和 0 个或多个除换行符以外的字符（注意在 Python 中不需要转义/ ）
- $ - end of string $ - 字符串结尾

So, you can rewrite it as ^(?:[a-z0-9]*:)?//.*$ .因此，您可以将其重写为^(?:[a-z0-9]*:)?//.*$ 。 he i flag should be used with this regex.他i标志应该与这个正则表达式一起使用。

^[^\\/]+\\/[^\\/].*$|^\\/[^\\/].*$ - is not optimal and has 2 alternatives ^[^\\/]+\\/[^\\/].*$|^\\/[^\\/].*$ - 不是最优的，有 2 个选择

Alternative 1:备选方案 1：

^ - start of string ^ - 字符串的开始
[^\\/]+ - 1 or more characters other than / [^\\/]+ - 除/之外的 1 个或更多字符
\\/ - Literal / \\/ - 文字/
[^\\/].*$ - a character other than / followed by any 0 or more characters other than a newline [^\\/].*$ - 除/以外的字符，后跟除换行符以外的任意 0 个或多个字符

Alternative 2:备选方案 2：

^ - start of string ^ - 字符串的开始
\\/ - Literal / \\/ - 文字/
[^\\/].*$ - a symbol other than / followed by any 0 or more characters other than a newline up to the end of string. [^\\/].*$ - 除/之外的符号，后跟除换行符以外的任何 0 个或多个字符，直到字符串末尾。

It is clear that the whole regex can be shortened to ^[^/]*/[^/].*$ .很明显，整个正则表达式可以缩短为^[^/]*/[^/].*$ 。 The i option can safely be removed from the regex flags. i选项可以安全地从正则表达式标志中删除。

Answer 2

To match absolutes:匹配绝对值：

/^([a-z0-9]*:|.{0})\/\/.*$/gmi

Live testing here.现场测试在这里。

And to match relatives:并匹配亲戚：

/^[^\/]+\/[^\/].*$|^\/[^\/].*$/gmi

Live testing here.现场测试在这里。

Answer 3

I prefer this one, it captures more edge cases:我更喜欢这个，它捕获了更多的边缘情况：

Source: https://www.regextester.com/94254来源： https : //www.regextester.com/94254

正则表达式检查给定的字符串是否是相对 URL

问题描述

3 个解决方案

解决方案1
10 已采纳 2015-07-15 13:37:48

解决方案2
2

解决方案3
1 2020-05-13 02:27:51

正则表达式检查给定的字符串是否是相对 URL

问题描述

3 个解决方案

解决方案1 10 已采纳 2015-07-15 13:37:48

解决方案2 2

解决方案3 1 2020-05-13 02:27:51

解决方案1
10 已采纳 2015-07-15 13:37:48

解决方案2
2

解决方案3
1 2020-05-13 02:27:51