[英]Python Regex: Replace all urls in string with <img> and <a> tags
I have a string with many urls to some pages and images:我有一个字符串,其中包含指向某些页面和图像的许多 url:
La-la-la https://example.com/ la-la-la https://example.com/example.PNG
And I need to convert it to:我需要将其转换为:
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">
Image formats are unpredictable, they can be .png
.JPEG
etc., and any links can be found multiple times per string图像格式不可预测,它们可以是
.png
.JPEG
等,并且每个字符串可以多次找到任何链接
I understand, that there are some strange javascript examples here, but I can not get how to convert them to python.我知道,这里有一些奇怪的 javascript 示例,但我不知道如何将它们转换为 python。
But I found this as a starting point:但我发现这是一个起点:
url_regex = /(\\b(https?|ftp|file):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|])/ig
img_regex = /^ftp|http|https?:\\/\\/(?:[az\\-]+\\.)+[az]{2,6}(?:\\/[^\\/#?]+)+\\.(?:jpe?g|gif|png)$/ig
url_regex = /(\\b(https?|ftp|file):\\/\\/[-A-Z0-9+&@#\\/%?=~_|!:,.;]*[-A-Z0-9+&@#\\/%=~_|])/ig
img_regex = /^ftp|http|https?:\\/\\/(?:[az\\-]+\\.)+[az]{2,6}(?:\\/[^\\/#?]+)+\\.(?:jpe?g|gif|png)$/ig
Big thx for help非常感谢帮助
You can do this without regex
, if you want.如果需要,您可以在没有
regex
情况下执行此操作。
stng = 'La-la-la https://example.com/ la-la-la https://example.com/example.PNG'
sentance = '{f_txt} <a href="{f_url}">{f_url}</a> {s_txt} <img src="{s_url}">'
f_txt, f_url, s_txt, s_url = stng.split()
print(sentance.format(f_txt=f_txt, f_url=f_url, s_txt=s_txt, s_url=s_url))
Output输出
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">
You may use the following regular expression:您可以使用以下正则表达式:
(https?.*?\\.com\\/)(\\s+[\\w-]*\\s+)(https?.*?\\.com\\/[\\w\\.]+)
(https?.*?\\.com\\/)
First capture group. (https?.*?\\.com\\/)
第一个捕获组。 Capture http
or https
, anything up to .com
and forward slash /
.http
或https
,任何到.com
和正斜杠/
。(\\s+[\\w-]*\\s+)
Second capture group. (\\s+[\\w-]*\\s+)
第二个捕获组。 Capture whitespace, alphanumerical characters and hypens, and whitespace.(https?.*?\\.com\\/[\\w\\.]+)
Third capture group. (https?.*?\\.com\\/[\\w\\.]+)
第三个捕获组。 Capture http
or https
, anything up to .com
, forward slash /
, alphanumerical characters and full stop .
http
或https
、 .com
https
任何内容、正斜杠/
、字母数字字符和句号.
for the extension. You can test the regex live here .您可以在此处测试正则表达式。
Alternatively, if you are expecting variable urls and domains you may use:或者,如果您需要可变的 url 和域,您可以使用:
(\\w*\\:.*?\\.\\w*\\/)(\\s+[\\w-]*\\s+)(\\w*\\:?.*?\\.\\w*\\/[\\w\\.]+)
Where first and third capture groups now do match any alphanumerical characters followed by colon :
, and anything up to a .
第一个和第三个捕获组现在确实匹配任何字母数字字符后跟冒号
:
,以及任何到 a 的任何字符.
, alphanumerical characters \\w
and forward slash. , 字母数字字符
\\w
和正斜杠。 You can test this here .你可以在这里测试。
You may replace captured groups with:您可以将捕获的组替换为:
<a href="\\1">\\1</a>\\2<img src="\\3">
Where \\1
, \\2
, and \\3
are backreferences to captured groups one, two and three respectively.其中
\\1
、 \\2
和\\3
分别是对捕获的第一组、第二组和第三组的反向引用。
Python snippet: Python 片段:
>>import re
>>str = "La-la-la https://example.com/ la-la-la https://example.com/example.PNG"
>>out = re.sub(r'(https?.*?\.com\/)(\s+[\w-]*\s+)(https?.*?\.com\/[\w\.]+)',
r'<a href="\1">\1</a>\2<img src="\3">',
str)
>>print(out)
La-la-la <a href="https://example.com/">https://example.com/</a> la-la-la <img src="https://example.com/example.PNG">
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.