简体   繁体   English

如何单行搜索?

[英]How to one-line this search?

I have a very long line, where I would like to find all the links when they are followed by class="filelink" 我有一排很长的线,当它们后跟class="filelink"时,我想在其中找到所有链接

A link could look like this 链接可能看起来像这样

<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"

How is such a problem written as a Perl one-liner? 这样的问题如何写成Perl单线?

Update 更新资料

If I do 如果我做

echo '<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"' > test
perl -pe 's/href="(.*)" class="filelink"/\1/g' test

then I get 然后我得到

<a https://example.com/@api/files/123/=2008.pdf

where I would have expected 我本来期望的

https://example.com/@api/files/123/=2008.pdf

Solution with robust HTML parser instead of regex: 使用健壮的HTML解析器而不是正则表达式的解决方案:

<input_long_line.html perl -MWeb::Query=wq -ne '
    wq($_)
    ->find("a.filelink")
    ->each(sub {
        printf "URL %s\t text %s\n", $_[1]->attr("href"), $_[1]->text
    })'

I wrapped it for readability, it runs fine as a one-liner. 我将其包装起来以提高可读性,并且它可以单行运行。

perl -nE'say for m/<a\s+href="([^"]+)"\s+class="filelink"[^>]*>/g;'

An alternative approach using HTML::TreeBuilder::XPath , which I find to be quite nice: 使用HTML::TreeBuilder::XPath的另一种方法,我发现它非常不错:

M=HTML::TreeBuilder::XPath; \
perl -M$M -le 'print $_->attr("href") for ' \
           -e "$M->new_from_content(<STDIN>)->" \
           -e 'findnodes(q(//a[@class="filelink"]))' < input-file

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Sed:搜索并替换为4GB单行文件 - Sed: Search and replace on a 4GB one-line file 将单行JavaScript注释(//)与re匹配 - Matching one-line JavaScript comments (//) with re 是否存在等效于以下内容的单行perl:sed -n -f multipatternfile文本文件? - Is there a one-line perl equivalent to: sed -n -f multipatternfile textfile? 多行正则表达式应在文件中多次匹配(如果可能,单行命令) - Multi-line regex should match multiple times in a file (one-line command if possible) 防止AttributeError:“ NoneType”在单行正则表达式函数中没有属性“ group” - Prevent AttributeError: 'NoneType' has no attribute 'group' in one-line regex function 有人可以建议使用单行正则表达式来解析带有 - 或 / 分隔符的字母数字和可选数字 ID 吗? - Can someone suggest a one-line regex to parse out alphanumeric and optional numeric ids with either a - or / separator? 使用单行reg exp grep / sed从csv文件中删除不包含字符串的行 - Removing rows that don't contain strings from csv file, using one-line reg exp grep/sed 如何搜索一行中单词之间不止一个空格的出现 - How to search for occurrences of more than one space between words in a line 保持正则表达式搜索到一行 - Keeping Regex search to one line 正则表达式查询:如何在PDF中搜索一个短语,其中该短语中的单词出现在多行上? - Regex query: how can I search PDFs for a phrase where words in that phrase appear on more than one line?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM