awk如何从html标记中提取信息

Question

I want the FS to be <......> 我希望FS能够<......>

the dots being ANYTHING. 点是任何东西。 So if I have let's say 所以，如果我让我们说

<td width="50%" valign="top">System Hardware</td>

I want to extract System Hardware . 我想提取System Hardware 。 I've tried 2 things but it doesn't work. 我尝试了两件事，但它不起作用。

awk -F "\\<([^>]+)\\>" '{print $1}' test.txt
awk -F "\\<?*\\>" '{print $1}' test.txt

In both cases I get nothing 在这两种情况下，我什么都没得

Answer 1

You're getting nothing because you're telling awk to print $1 which would be the field BEFORE the first field separator. 你什么也没得到，因为你告诉awk打印$ 1，这将是第一个字段分隔符之前的字段。 You want print $2 . 你想print $2 。

$ awk -F'<[^>]+>' '{print $2}' file
System Hardware

awk如何从html标记中提取信息

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-10-05 02:28:12

awk如何从html标记中提取信息

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-10-05 02:28:12

解决方案1
3 已采纳 2014-10-05 02:28:12