如何单行搜索？

Question

I have a very long line, where I would like to find all the links when they are followed by class="filelink" 我有一排很长的线，当它们后跟class="filelink"时，我想在其中找到所有链接

A link could look like this 链接可能看起来像这样

<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"

How is such a problem written as a Perl one-liner? 这样的问题如何写成Perl单线？

Update 更新资料

If I do 如果我做

echo '<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"' > test
perl -pe 's/href="(.*)" class="filelink"/\1/g' test

then I get 然后我得到

<a https://example.com/@api/files/123/=2008.pdf

where I would have expected 我本来期望的

https://example.com/@api/files/123/=2008.pdf

Answer 1

Solution with robust HTML parser instead of regex: 使用健壮的HTML解析器而不是正则表达式的解决方案：

<input_long_line.html perl -MWeb::Query=wq -ne '
    wq($_)
    ->find("a.filelink")
    ->each(sub {
        printf "URL %s\t text %s\n", $_[1]->attr("href"), $_[1]->text
    })'

I wrapped it for readability, it runs fine as a one-liner. 我将其包装起来以提高可读性，并且它可以单行运行。

Answer 2

perl -nE'say for m/<a\s+href="([^"]+)"\s+class="filelink"[^>]*>/g;'

Answer 3

An alternative approach using HTML::TreeBuilder::XPath , which I find to be quite nice: 使用HTML::TreeBuilder::XPath的另一种方法，我发现它非常不错：

M=HTML::TreeBuilder::XPath; \
perl -M$M -le 'print $_->attr("href") for ' \
           -e "$M->new_from_content(<STDIN>)->" \
           -e 'findnodes(q(//a[@class="filelink"]))' < input-file

如何单行搜索？

问题描述

3 个解决方案

解决方案1
10 2012-01-09 15:13:50

解决方案2
3 已采纳 2012-01-09 15:32:31

解决方案3
2 2012-01-09 23:06:31

如何单行搜索？

问题描述

3 个解决方案

解决方案1 10 2012-01-09 15:13:50

解决方案2 3 已采纳 2012-01-09 15:32:31

解决方案3 2 2012-01-09 23:06:31

解决方案1
10 2012-01-09 15:13:50

解决方案2
3 已采纳 2012-01-09 15:32:31

解决方案3
2 2012-01-09 23:06:31