简体   繁体   中英

How to get only part of a line using grep/sed/awk with regex?

I have an HTML file of which I need to get only an specific part. The biggest challenge here is that this HTML file doesn't have linebreaks, so my grep expression isn't working well.

Here is my HTML file:

<a href="/link1" param1="data1_1" param2="1_2"><p>Test1</p></a><a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

Note that I have two anchors ( <a> ) on this line.

I want to get the second anchor and I was trying to get it using:

cat example.html | grep -o "<a.*Test2</p></a>"

Unfortunately, this command returns the whole line, but I want only:

<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

I don't know how to do this with grep or sed, I'd really appreciate any help.

With GNU awk for multi-char RS, if it's the second record you want:

$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"} NR==2' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

or if it's the record labeled "Test2":

$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"} /<p>Test2<\/p>/' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

or:

$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"; FS="</?p>"} $2=="Test2"' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

Using Perl:

$ perl -pe '@a = split(m~(?<=</a>)~, $_);$_ = $a[1]' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

Breakdown:

perl -pe '                                       ' # Read line for line into $_
                                                   # and print $_ at the end
                     m~(?<=</a>)~                  # Match the position after
                                                   # each </a> tag
          @a = split(            , $_);            # Split into array @a
                                       $_ = $a[1]  # Take second item

这应该做:

grep -o '<a[^>]*><p>Test2</p></a>' example.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM