简体   繁体   English

正则表达式仅查找有效的url +文本python

[英]Regex to find valid url + text python only

I have a regex 我有一个正则表达式

((http\://|https\://|ftp\://)|(www.)|([a-zA-Z0-9\.-]))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}))(/[a-zA-Z0-9%:/-_\?\.'~#-]*)? 

which is picking valid url perfectly. 完美地选择了有效的网址。

I have a scenario where there can be 我有一种可能的情况

  1. VALID URL + TEXT or (www.abc.com testing the regex) 有效的URL + TEXT或(www.abc.com测试正则表达式)
  2. Text + VALID URL (Testing the regex www.abc.com) 文字+有效网址(测试正则表达式www.abc.com)

REQ: 要求:

What i want is first the regex check the valid url then if url is valid it ignores the Valid url and search TEXT only outside the Valid URL. 我想要的是首先正则表达式检查有效的网址,然后如果该网址有效,它将忽略有效的网址,仅在有效网址之外搜索TEXT。

Issues: 问题:

I have tried many regex but it is picking the valid url also which i don't want i only want if url is valid search for the text outside the url. 我已经尝试了很多正则表达式,但是它也选择了有效的URL,但我不希望我仅在URL有效的情况下才搜索该URL之外的文本。

NO function Please . 没有功能请。 I am trying to fix this using Regex. 我正在尝试使用Regex修复此问题。

Perhaps you want this: 也许您想要这样:

(.*?)((?:(?:http\:\/\/|https\:\/\/|ftp\:\/\/)|(?:www.)|(?:[a-zA-Z0-9\.-]))+(?:(?:[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}))(?:\/[a-zA-Z0-9%:\/-_\?\.'~#-]*)?)(.*)

See demo here: 在此处查看演示

You will get three groups, you can used named groups to catpure beforeUrl text, Url and afterUrl text, which will be something like this: 您将获得三个组,可以使用命名组来区分beforeUrl文本, UrlafterUrl文本,如下所示:

(?<beforeUrl>.*?)(?<Url>(?:(?:http\:\/\/|https\:\/\/|ftp\:\/\/)|(?:www.)|(?:[a-zA-Z0-9\.-]))+(?:(?:[a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4}))(?:\/[a-zA-Z0-9%:\/-_\?\.'~#-]*)?)(?<afterUrl>.*)

See demo here. 在此处查看演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM