简体   繁体   English

Python正则表达式:用HTML链接替换字符串中的所有url,但不包括.png,.gif,.jpg,.jpeg

[英]Python Regex: Replace all urls in string with HTML links excluding .png, .gif, .jpg, .jpeg

I have a big multiline string that might contain lots of very different urls in any places of the line, like: 我有一个很大的多行字符串,在该行的任何地方都可能包含很多非常不同的URL,例如:

La-la-la https://example.com/ https://example.com/ https://example.com/ la-la-la https://example.com/ la-la-la https://example.com/ la-la-la

And I need to replace them all with <a href="https://example.com/">https://example.com/</a> 我需要将它们全部替换为<a href="https://example.com/">https://example.com/</a>

Conditions: 条件:

  • URL - is everything that starts with https?:// and goes till whitespace/linebreak/end of the entire string. URL-是所有以https?://开头,直到整个字符串的空白/换行符/结尾的内容。

  • But. 但。 URLs, that end strictly with .png, .gif, .jpg, .jpeg in any cases should not be matched (Later they'll be replaced them with tag) 严格以.png,.gif,.jpg,.jpeg结尾的网址在任何情况下都不应匹配(以后将用标记替换它们)

You can use re.sub : 您可以使用re.sub

import re
def href(d, skip = ['.png', '.gif', '.jpg', '.jpeg']):
  return f'<a href="{d}">{d}</a>' if not re.findall('|'.join(skip), d) else d

s = """
La-la-la https://example.com/ https://example.com/
https://example.com/ la-la-la https://example.com/
la-la-la https://example.com/ la-la-la
"""
new_s = re.sub('https*://.*?(?=[\s$])', lambda x:href(x.group()), s)

Output: 输出:

La-la-la <a href="https://example.com/">https://example.com/</a> <a href="https://example.com/">https://example.com/</a> <a href="https://example.com/">https://example.com/</a> la-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <a href="https://example.com/">https://example.com/</a> la-la-la
`badtags = ['.png', '.gif', 'jpg', 'jpeg']
 goodurls = ['https://', 'http://']
 for line in string:
     for word in line.strip().split():
         if(word[0:7] == 'https://' or word[0:6] == 'http://'):
             if(not word[-4:] in badtags):
                 // replace logic
`

This is a fairly simply way to do it, you may have to use a regular iterating for loop to be able to index your original array. 这是一种非常简单的方法,您可能必须使用常规的for循环迭代程序才能对原始数组建立索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM