简体   繁体   English

awk如何从html标记中提取信息

[英]awk How to extract information from html tag

I want the FS to be <......> 我希望FS能够<......>

the dots being ANYTHING. 点是任何东西。 So if I have let's say 所以,如果我让我们说

<td width="50%" valign="top">System Hardware</td>

I want to extract System Hardware . 我想提取System Hardware I've tried 2 things but it doesn't work. 我尝试了两件事,但它不起作用。

  1. awk -F "\\<([^>]+)\\>" '{print $1}' test.txt
  2. awk -F "\\<?*\\>" '{print $1}' test.txt

In both cases I get nothing 在这两种情况下,我什么都没得

You're getting nothing because you're telling awk to print $1 which would be the field BEFORE the first field separator. 你什么也没得到,因为你告诉awk打印$ 1,这将是第一个字段分隔符之前的字段。 You want print $2 . 你想print $2

$ awk -F'<[^>]+>' '{print $2}' file
System Hardware

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM