简体   繁体   English

使用awk在多个字段中基于模式匹配打印行

[英]printing lines based on pattern matching in multiple fields using awk

Suppose I have a html input like 假设我有一个类似html的输入

<li>this is a html input line</li>

I want to filter all such input lines from a file which begins with <li> and ends with </li> . 我想从以<li>开始并以</li>结束的文件中过滤所有这些输入行。 Now my idea was to search for pattern <li> in the first field and pattern </li> in the last field using the below awk command 现在,我的想法是使用以下awk命令在第一个字段中搜索模式<li>在最后一个字段中搜索模式</li>

awk '$1 ~ /\<li\>/ ; $NF ~ /\</li\>/ {print $0}'

but looks like there is no provision to match two fields at a time or I am making some syntax mistakes. 但似乎没有规定一次匹配两个字段,或者我在语法上犯了一些错误。 Could you please help me here? 你能在这里帮我吗?

PS: I am working on a Solaris SunOS machine. PS:我正在使用Solaris SunOS计算机。

There's a lot going wrong in your script on Solaris: Solaris上的脚本有很多错误:

awk '$1 ~ /\<li\>/ ; $NF ~ /\</li\>/ {print $0}'
  1. The default awk on Solaris (and so the one we have to assume you are using since you didn't state otherwise) is old, broken awk which must never be used. Solaris上的默认awk(因此,您必须声明您正在使用的awk,因为您没有另外声明)是旧的,损坏的awk,切勿使用。 On Solaris use /usr/xpg4/bin/awk . 在Solaris上,使用/usr/xpg4/bin/awk There's also nawk but it's got less POSIX features (eg. no support for character classes). 还有nawk但它有更少的POSIX功能(例如,用于字符类不支持)。
  2. \\<...\\> are gawk-specific word boundaries. \\<...\\>是gawk特定的单词边界。 There is no awk on Solaris that would recognize those. Solaris上没有awk可以识别这些。 If you were just trying to get literal characters then there's no need to escape them as they are not regexp metacharacters. 如果您只是想获取文字字符,则无需转义它们,因为它们不是正则表达式元字符。
  3. If you want to test for condition 1 and condition 2 you put && between them, not ; 如果要测试条件1和条件2,则将&&放在两者之间,而不是; which is just the statement terminator in lieu of a newline. 这只是语句终止符来代替换行符。
  4. The default action given a true condition is {print $0} so you don't need to explicitly write that code. 给定真实条件的默认操作是{print $0}因此您无需显式编写该代码。
  5. / is the awk regexp delimiter so you do need to escape that in mid-regexp. /是awk正则表达式分隔符,因此您确实需要在正则表达式中间进行转义。
  6. The default field separator is white space so in your posted sample input $1 and $NF will be <li>this and line</li> , not <li> and </li> . 默认的字段分隔符是空格,因此在发布的示例输入中, $1$NF将是<li>thisline</li> ,而不是<li></li>

So if you DID for some reason compare multiple fields you could do: 因此,如果由于某种原因DID比较多个字段,则可以执行以下操作:

awk '($1 ~ /^<li>.*/) && ($NF ~ /.*<\/li>$/)'

but this is probably what you really want: 但这可能是您真正想要的:

awk '/^<li>.*<\/li>/'

in which case you could just use grep: 在这种情况下,您可以使用grep:

grep '^<li>.*</li>'

Why not just use a regex to match the start and end of the line like 为什么不使用正则表达式来匹配行的开头和结尾,例如

awk '/^[[:space:]]*<li>.*<\/li>[[:space:]]*$/ {print}'

though in general if you're trying to process HTML you'll be better of using a tool that's really designed to handle that. 尽管通常来说,如果您要处理HTML,最好使用专门设计用于处理HTML的工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM