Extract text with sed

Question

I have this text file (it's really a part of an html):

<tr>
              <td width="10%" valign="top"><P>Name:</P></td>
              <td colspan="2"><P>
                XXXXX
              </P></td>
            </tr>
            <tr>
              <td width="10%" valign="top"><p>City:</p></td>
              <td colspan="2"><p>
                Mycity
              </p></td>
            </tr>
            <tr>
              <td width="10%" valign="top"><p>County:</p></td>
              <td colspan="2"><p>
                YYYYYY
              </p></td>
            </tr>
            <tr>
              <td width="10%" valign="top"><p>Map:</p></td>
              <td colspan="2"><p>
                ZZZZZZZZ

I've used this sed command to extract "Mycity"

$ tr -d '\n' < file.html | sed -n 's/.*City:<\/p><\/td>.*<p>\(.*\)<\/p><\/td>.*/\1/p'

The regular expression as far as I know works but I get

Map:

Instead of Mycity .

I've tested the REGEX with Rubular and works but not with sed. Is sed not the right tool? What I¡m I doing wrong?

PS: I'm using Linux

Answer 1

The problem that you have right now is that regex is greedy by default

's/.*City:<\/p><\/td>.*<p>\(.*\)<\/p><\/td>.*/\1/p'
                     ^ // here!

So it's matching everything up to the last section. To be non-greedy use a ?

's/.*City:<\/p><\/td>.*?<p>\(.*\)<\/p><\/td>.*/\1/p'
                       ^

Answer 2

sed is always the wrong tool for anything that involves processing multiple lines. Just use awk, it's what it was invented to do:

$ awk 'c&&!--c; /City:/{c=2}' file.html
                Mycity

See Printing with sed or awk a line following a matching pattern

Extract text with sed

Question

2 answers

solution1
2 ACCPTED 2015-05-23 13:48:31

solution2
2 2015-05-24 12:35:17

Extract text with sed

Question

2 answers

solution1 2 ACCPTED 2015-05-23 13:48:31

solution2 2 2015-05-24 12:35:17

solution1
2 ACCPTED 2015-05-23 13:48:31

solution2
2 2015-05-24 12:35:17