使用sed提取HTML标记数据

Question

I wish to extract data between known HTML tags. 我希望在已知的HTML标记之间提取数据。 For example: 例如：

Hello, <i>I<i> am <i>very</i> glad to meet you.

Should become: 应该成为：

'I

very'

So I have found something that works to nearly do this. 所以我发现一些作品近做到这一点。 Unfortunately, it only extracts the last entry. 不幸的是，它只提取最后一个条目。

sed -n -e 's/.*<i>\(.*\)<\/i>.*/\1/p'

Now I can append any end tag </i> with a newline character and this works fine. 现在我可以使用换行符添加任何结束标记</i> ，这样可以正常工作。 But is there a way to do it with just one sed command? 但有没有办法只用一个sed命令来做到这一点？

Answer 1

Give this a try: 尝试一下：

sed -n 's|[^<]*<i>\([^<]*\)</i>[^<]*|\1\n|gp'

And your example is missing a "/": 你的例子缺少一个“/”：

Hello, <i>I</i> am <i>very</i> glad to meet you.

Answer 2

试试这个：

$ sed 's/<[^>]*>//g' file.html

Answer 3

$ awk -vFS="<.[^>]*>" '{for(i=2;i<=NF;i+=2)print $i}' file
I
very

使用sed提取HTML标记数据

问题描述

3 个解决方案

解决方案1
3 2010-08-28 01:56:13

解决方案2
2 2011-10-08 07:38:22

解决方案3
0 2010-08-28 00:56:55

使用sed提取HTML标记数据

问题描述

3 个解决方案

解决方案1 3 2010-08-28 01:56:13

解决方案2 2 2011-10-08 07:38:22

解决方案3 0 2010-08-28 00:56:55

解决方案1
3 2010-08-28 01:56:13

解决方案2
2 2011-10-08 07:38:22

解决方案3
0 2010-08-28 00:56:55