简体   繁体   中英

How to one-line this search?

I have a very long line, where I would like to find all the links when they are followed by class="filelink"

A link could look like this

<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"

How is such a problem written as a Perl one-liner?

Update

If I do

echo '<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"' > test
perl -pe 's/href="(.*)" class="filelink"/\1/g' test

then I get

<a https://example.com/@api/files/123/=2008.pdf

where I would have expected

https://example.com/@api/files/123/=2008.pdf

Solution with robust HTML parser instead of regex:

<input_long_line.html perl -MWeb::Query=wq -ne '
    wq($_)
    ->find("a.filelink")
    ->each(sub {
        printf "URL %s\t text %s\n", $_[1]->attr("href"), $_[1]->text
    })'

I wrapped it for readability, it runs fine as a one-liner.

perl -nE'say for m/<a\s+href="([^"]+)"\s+class="filelink"[^>]*>/g;'

An alternative approach using HTML::TreeBuilder::XPath , which I find to be quite nice:

M=HTML::TreeBuilder::XPath; \
perl -M$M -le 'print $_->attr("href") for ' \
           -e "$M->new_from_content(<STDIN>)->" \
           -e 'findnodes(q(//a[@class="filelink"]))' < input-file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM