sed-从字符串中提取特定字符

Question

So I have some unclean HTML: 所以我有一些不干净的HTML：

"<table class="content divbackground"><tr><td class='title'>&nbsp;</td><td class='title'>From</td><td class='title'>To</td></tr><tr><td class='entry'>Monday</td><td class='entry'>09:00</td><td class='entry'>18:00</td></tr><tr><td class='entry'>Tuesday</td><td class='entry'>09:00</td><td class='entry'>18:00</td></tr><tr><td class='entry'>Wednesday</td><td class='entry'>09:00</td><td class='entry'>18:00</td></tr><tr><td class='entry'>Thursday</td><td class='entry'>09:00</td><td class='entry'>20:00</td></tr><tr><td class='entry'>Friday</td><td class='entry'>09:00</td><td class='entry'>20:00</td></tr><tr><td class='entry'>Saturday</td><td class='entry'>09:00</td><td class='entry'>18:00</td></tr><tr><td class='entry'>Sunday</td><td class='entry'>11:00</td><td class='entry'>18:00</td></tr></table></td></td>"

It's the opening hours of a pharmacy (the information is published on a public register). 这是药房的营业时间（信息在公共登记册上发布）。

Now I could parse the HTML using a parser, but I find that this is not robust to errors and I still have to pull out the code between <table> and </table> . 现在，我可以使用解析器来解析HTML，但是我发现这对错误不是很可靠，并且我仍然必须提取<table>和</table>之间的代码。

Is there some nice unix command (sed?) that searches for all occurances of: 是否有一些不错的unix命令（sed？）来搜索以下所有事件：

XX:XX XX：XX

inside <td></td> tags 在<td></td>标记内

where X must be a number? X必须是数字吗？

Answer 1

handle html with regex is not the good practice. 用正则表达式处理html不是一个好习惯。 however if your input format is fixed, you can try this grep line: 但是，如果输入格式是固定的，则可以尝试以下grep行：

 grep -oP '<td[^>]*>\K\d\d:\d\d' input

with your example input, it outputs: 用您的示例输入，它输出：

sed-从字符串中提取特定字符

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-04-02 08:35:05

sed-从字符串中提取特定字符

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-04-02 08:35:05

解决方案1
2 已采纳 2015-04-02 08:35:05