I have a text file something like this.
<tbody>
<tr>
<td>
String1
</td>
<td>
String2
</td>
<td>
String3
</td>
...
...
<td>
StringN
</td>
</tr>
</tbody>
This is the output that I want.
<tbody>
<tr>
String1;String2;String3;... ...;StringN
</tr>
</tbody>
Here is my BUGGY code.
sed '{
:a
N
$!ba
s|<td.*>\(.*\)</td>|\1|
}'
I wanted to remove all <td>
and </td>
tags and get all the strings delimitered by some string (I can filter those strings later using that as the delimiter charater). I used the solution given in this URL . Output does not come as I expected.
This is the actual Code
<tbody>
<tr>
<td>
<a href="/120.52.72.58/80">120.52.72.58:80</a>
</td>
<td>
HTTP
</td>
<td>
<span class="text-danger">Transparent</span>
</td>
<td>
<abbr title="2016-12-15 00:07:46">12h ago</abbr>
</td>
<td class="small">
<span class="text-muted">—</span>
</td>
<td>
<img src="/flags/png/cn.png" alt="China (CN)" title="China (CN)" onerror="this.style.display='none'"> <abbr title="China">CN</abbr>
</td>
<td class="small">
Beijing
</td>
<td class="small">
Beijing
</td>
<td class="small">
China Unicom IP network
</td>
<td class="small">
<span class="text-muted">—</span>
</td>
</tr>
</tbody>
Output does not come as I expected.
Your sed
code does not work because the <td.*>\\(.*\\)</td>
matches the part of the pattern space from the first <td
up to the last </td>
due to the greediness of the *
quantifier. Unfortunately, sed
doesn't support a more modern regex flavor with ungreedy quantifiers; thus, some other tool would be more appropriate.
I wanted to remove all
<td>
and</td>
tags and get all the strings delimitered by some string …
If those tags are always (as in your examples) on a separate line, we can do with a simple sed
command:
sed '/<\/*td.*>/d'
All the strings are thereafter delimited by some string which is \\n
followed by spaces.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.