简体   繁体   English

正则表达式,匹配没有“http://”的URL和任何其他“/”

[英]Regular Expression, match url without “http://” and any other “/”

I looked around for a while, but probably I can't "Google" with the proper keywords.. so I'm here. 我环顾了一会儿,但可能我不能用适当的关键词“谷歌”..所以我在这里。 I need to match an url stripping out protocol to first / 我需要匹配一个url剥离协议到第一个/

Target: match the first substring from http:// to first / (maybe last / don't exist) or to the end And here come a problem: 目标:匹配http://中的第一个子字符串到第一个/(可能是最后/不存在)或结束这里出现问题:

i wrote this regex 我写了这个正则表达式

(?<=//)(.*?)(?=/) (?<= //)(。*?)(?= /)

but this regex matches only url with at least 1 '/' in the end excluding the protocol.. 但是这个正则表达式只匹配url,最后除了协议之外至少有1'/'。

here some url to be matched: 这里有一些网址要匹配:

  • http:// www.google.com / (matched by my regex) http:// www.google.com /(与我的正则表达式相匹配)
  • http:// www.google.com http:// www.google.com
  • https:// www.google https:// www.google
  • xxx:// www.google.com /hello/bleh blah....../ xxx:// www.google.com / hello / bleh blah ...... /
  • xxx:// google.com xxx:// google.com
  • google.com /blah/hello.php?x=11_x.hi google.com /blah/hello.php?x=11_x.hi
^(?:\w+://)?([\w.-]+)/?.*$

(Java的双反斜杠)似乎适用于所有示例,包括简单的www.google.com

Something like... 就像是...

^(https?:\/\/)?([0-9a-zA-Z][-\w]*[0-9a-zA-Z\.)+[a-zA-Z]{2,6})\/

I saw this in a book I had. 我在一本书中看到了这一点。 That should account for a variable http/https, disallow whitespace, and probably stop at the first slash. 这应该考虑变量http / https,禁止空格,并可能停在第一个斜杠。

Comment if I did this wrong. 评论我是否做错了。

This is working for all your example but the last: 这适用于所有示例,但最后一个:

(?<=//)[^/\\s]+

[^/\\\\s] is a negated character class matching every character except / and \\s (whitespace, eg a space, tab or newline characters) [^/\\\\s]是一个否定的字符类,匹配除/\\s之外的每个字符(空格,例如空格,制表符或换行符)

See it here on Regexr 在Regexr上看到它

What will not work is the last row. 什么是行不通的是最后一行。 How do you want to decide what is a link? 您想如何决定什么是链接? If I make the first part optional, it will match on every character except / and whitespaces. 如果我使第一部分可选,它将匹配除/和空格之外的每个字符。

It seems like you have the right answer, but you're missing the possibility of not having a trailing "/". 看起来你有正确的答案,但你错过了没有尾随“/”的可能性。 Try this: 尝试这个:

(?<=//)(.*?)(?=/|$)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM