简体   繁体   English

Python Regex:将字符串中的所有网址替换为<img>和<a>标签</a>

[英]Python Regex: Replace all urls in string with <img> and <a> tags

I have a string with many urls to some pages and images:我有一个字符串,其中包含指向某些页面和图像的许多 url:

La-la-la https://example.com/ la-la-la https://example.com/example.PNG

And I need to convert it to:我需要将其转换为:

La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">

Image formats are unpredictable, they can be .png .JPEG etc., and any links can be found multiple times per string图像格式不可预测,它们可以是.png .JPEG等,并且每个字符串可以多次找到任何链接

I understand, that there are some strange javascript examples here, but I can not get how to convert them to python.我知道,这里有一些奇怪的 javascript 示例,但我不知道如何将它们转换为 python。

But I found this as a starting point:但我发现这是一个起点:

url_regex = /(\\b(https?|ftp|file):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|])/ig img_regex = /^ftp|http|https?:\\/\\/(?:[az\\-]+\\.)+[az]{2,6}(?:\\/[^\\/#?]+)+\\.(?:jpe?g|gif|png)$/ig url_regex = /(\\b(https?|ftp|file):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|])/ig img_regex = /^ftp|http|https?:\\/\\/(?:[az\\-]+\\.)+[az]{2,6}(?:\\/[^\\/#?]+)+\\.(?:jpe?g|gif|png)$/ig

Big thx for help非常感谢帮助

You can do this without regex , if you want.如果需要,您可以在没有regex情况下执行此操作。

stng = 'La-la-la https://example.com/ la-la-la https://example.com/example.PNG'

sentance = '{f_txt} <a href="{f_url}">{f_url}</a> {s_txt} <img src="{s_url}">'

f_txt, f_url, s_txt, s_url = stng.split()

print(sentance.format(f_txt=f_txt, f_url=f_url, s_txt=s_txt, s_url=s_url))

Output输出

La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG"> 

You may use the following regular expression:您可以使用以下正则表达式:

(https?.*?\\.com\\/)(\\s+[\\w-]*\\s+)(https?.*?\\.com\\/[\\w\\.]+)

  • (https?.*?\\.com\\/) First capture group. (https?.*?\\.com\\/)第一个捕获组。 Capture http or https , anything up to .com and forward slash / .捕获httphttps ,任何到.com和正斜杠/
  • (\\s+[\\w-]*\\s+) Second capture group. (\\s+[\\w-]*\\s+)第二个捕获组。 Capture whitespace, alphanumerical characters and hypens, and whitespace.捕获空格、字母数字字符和连字符以及空格。 You can add more characters to the character set if needed.如果需要,您可以向字符集添加更多字符。
  • (https?.*?\\.com\\/[\\w\\.]+) Third capture group. (https?.*?\\.com\\/[\\w\\.]+)第三个捕获组。 Capture http or https , anything up to .com , forward slash / , alphanumerical characters and full stop .捕获httphttps.com https任何内容、正斜杠/ 、字母数字字符和句号. for the extension.为扩展。 Again you can add more characters to the character set in this capture group if you are expecting other characters.如果您需要其他字符,您可以再次向此捕获组中的字符集添加更多字符。

You can test the regex live here .您可以在此处测试正则表达式。

Alternatively, if you are expecting variable urls and domains you may use:或者,如果您需要可变的 url 和域,您可以使用:

(\\w*\\:.*?\\.\\w*\\/)(\\s+[\\w-]*\\s+)(\\w*\\:?.*?\\.\\w*\\/[\\w\\.]+)

Where first and third capture groups now do match any alphanumerical characters followed by colon : , and anything up to a .第一个和第三个捕获组现在确实匹配任何字母数字字符后跟冒号: ,以及任何到 a 的任何字符. , alphanumerical characters \\w and forward slash. , 字母数字字符\\w和正斜杠。 You can test this here .你可以在这里测试。

You may replace captured groups with:您可以将捕获的组替换为:

<a href="\\1">\\1</a>\\2<img src="\\3">

Where \\1 , \\2 , and \\3 are backreferences to captured groups one, two and three respectively.其中\\1\\2\\3分别是对捕获的第一组、第二组和第三组的反向引用。


Python snippet: Python 片段:

>>import re
>>str = "La-la-la https://example.com/ la-la-la https://example.com/example.PNG"

>>out = re.sub(r'(https?.*?\.com\/)(\s+[\w-]*\s+)(https?.*?\.com\/[\w\.]+)',
       r'<a href="\1">\1</a>\2<img src="\3">',
       str)
>>print(out)
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM