[英]Regex to find valid url + text python only
I have a regex 我有一个正则表达式
((http\://|https\://|ftp\://)|(www.)|([a-zA-Z0-9\.-]))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}))(/[a-zA-Z0-9%:/-_\?\.'~#-]*)?
which is picking valid url perfectly. 完美地选择了有效的网址。
I have a scenario where there can be 我有一种可能的情况
REQ: 要求:
What i want is first the regex check the valid url then if url is valid it ignores the Valid url and search TEXT only outside the Valid URL. 我想要的是首先正则表达式检查有效的网址,然后如果该网址有效,它将忽略有效的网址,仅在有效网址之外搜索TEXT。
Issues: 问题:
I have tried many regex but it is picking the valid url also which i don't want i only want if url is valid search for the text outside the url. 我已经尝试了很多正则表达式,但是它也选择了有效的URL,但我不希望我仅在URL有效的情况下才搜索该URL之外的文本。
NO function Please . 没有功能请。 I am trying to fix this using Regex.
我正在尝试使用Regex修复此问题。
Perhaps you want this: 也许您想要这样:
(.*?)((?:(?:http\:\/\/|https\:\/\/|ftp\:\/\/)|(?:www.)|(?:[a-zA-Z0-9\.-]))+(?:(?:[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}))(?:\/[a-zA-Z0-9%:\/-_\?\.'~#-]*)?)(.*)
You will get three groups, you can used named groups to catpure beforeUrl
text, Url
and afterUrl
text, which will be something like this: 您将获得三个组,可以使用命名组来区分
beforeUrl
文本, Url
和afterUrl
文本,如下所示:
(?<beforeUrl>.*?)(?<Url>(?:(?:http\:\/\/|https\:\/\/|ftp\:\/\/)|(?:www.)|(?:[a-zA-Z0-9\.-]))+(?:(?:[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}))(?:\/[a-zA-Z0-9%:\/-_\?\.'~#-]*)?)(?<afterUrl>.*)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.