简体   繁体   English

在 Linux 中链接文本文件

[英]Linkify text file in Linux

I have parsed all rows containing urls from a text file and appended line breaks, and I want to make the links clickable in a new file.我已经从文本文件中解析了所有包含 url 的行并附加了换行符,我想让链接在新文件中点击。

How do I append <a href> -tags around only the urls, using standard linux tools, preferably awk?如何使用标准 linux 工具(最好是 awk)仅在 url 周围附加<a href> -tags? It needs to be automatable in cron.它需要在 cron 中实现自动化。

For example,例如,

source file chaturls.txt :源文件chaturls.txt

    12:30 <user> check this: https://link.to/stuff.jpg</br>
    13:47 <user4> https://another.link.lol eyyyy</br>

desired output in new file, chatlinkified.html :新文件中的所需输出, chatlinkified.html

12:30 <user> check this: <a href='https://link.to/stuff.jpg'>https://link.to/stuff.jpg</a></br>
13:47 <user4> <a href='https://another.link.lol'>https://another.link.lol</a> eyyyy</br>

I tried awk '{printf "<a href=\\"%s\\">%s</a><br>", $0,$0}' chaturls.txt > chatlinkified.html , but this makes the whole line an (invalid) clickable link.我试过awk '{printf "<a href=\\"%s\\">%s</a><br>", $0,$0}' chaturls.txt > chatlinkified.html ,但这使整行awk '{printf "<a href=\\"%s\\">%s</a><br>", $0,$0}' chaturls.txt > chatlinkified.html (无效)可点击的链接。

sed -E 's@(https?://[^[:space:]/$.?#].[^[:space:]<]*)@<a href="\1">\1</a>@g' chaturls.txt > chatlinkified.html

You can use sed and refer back to the matched group with \\1 .您可以使用 sed 并使用\\1引用匹配的组。 NB.注意。 here I separate using the @ instead of / (as in s/../../g), you are free the use any character and this saves some escapes.在这里,我使用 @ 而不是 / (如在 s/../../g 中)分开,您可以自由使用任何字符,这样可以节省一些转义。

The regex for finding the URL does some validation checks for the first character after the https?:// and then proceeds the match until a space or the starting bracket of another tag.用于查找 URL 的正则表达式对 https?:// 之后的第一个字符进行一些验证检查,然后继续匹配,直到出现空格或另一个标签的起始括号。

You can if you want to use a more simpler regex for the url like, given in one of the comments https?://[^ ]*) which doesn't include this small validation.如果您想对 url 使用更简单的正则表达式,如https?://[^ ]*)评论之一中给出的,您可以,其中不包括此小验证。

You can find more extensive validated url regex here: https://mathiasbynens.be/demo/url-regex (But you have to convert from PHP regex to sed extended regex)您可以在此处找到更多经过验证的 url 正则表达式: https : //mathiasbynens.be/demo/url-regex (但您必须从 PHP 正则表达式转换为 sed 扩展正则表达式)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM