[英]How to one-line this search?
I have a very long line, where I would like to find all the links when they are followed by class="filelink"
我有一排很长的线,当它们后跟
class="filelink"
时,我想在其中找到所有链接
A link could look like this 链接可能看起来像这样
<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"
How is such a problem written as a Perl one-liner? 这样的问题如何写成Perl单线?
Update 更新资料
If I do 如果我做
echo '<a href="https://example.com/@api/files/123/=2008.pdf" class="filelink"' > test
perl -pe 's/href="(.*)" class="filelink"/\1/g' test
then I get 然后我得到
<a https://example.com/@api/files/123/=2008.pdf
where I would have expected 我本来期望的
https://example.com/@api/files/123/=2008.pdf
Solution with robust HTML parser instead of regex: 使用健壮的HTML解析器而不是正则表达式的解决方案:
<input_long_line.html perl -MWeb::Query=wq -ne '
wq($_)
->find("a.filelink")
->each(sub {
printf "URL %s\t text %s\n", $_[1]->attr("href"), $_[1]->text
})'
I wrapped it for readability, it runs fine as a one-liner. 我将其包装起来以提高可读性,并且它可以单行运行。
perl -nE'say for m/<a\s+href="([^"]+)"\s+class="filelink"[^>]*>/g;'
An alternative approach using HTML::TreeBuilder::XPath
, which I find to be quite nice: 使用
HTML::TreeBuilder::XPath
的另一种方法,我发现它非常不错:
M=HTML::TreeBuilder::XPath; \
perl -M$M -le 'print $_->attr("href") for ' \
-e "$M->new_from_content(<STDIN>)->" \
-e 'findnodes(q(//a[@class="filelink"]))' < input-file
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.