简体   繁体   English

从 grep/awk 结果中提取 substring?

[英]Extract substring from grep/awk results?

I have a grep command that finds rows in a file, passes those to awk and prints out the 1st and 15th columns.我有一个 grep 命令可以在文件中查找行,将这些行传递给 awk 并打印出第 1 列和第 15 列。

grep String1 /path/to/file.txt | grep string2 | awk -F ' ' '{print $1, $15}'

So far, so good.到目前为止,一切都很好。 This results in a list like this:这会产生如下列表:

2023-01-20 [text1]>
2023-01-22 [text2]>
2023-01-23 [text3]>
2023-01-25 [text4]>

Ideally, I'd like to add some regex to the awk command so that I get this:理想情况下,我想向 awk 命令添加一些正则表达式,以便我得到:

2023-01-20 text1
2023-01-22 text2
2023-01-23 text3
2023-01-25 text4

My searches have only returned how to use regex with awk to identify fields but not to extract a substring from the results.我的搜索只返回了如何使用带有 awk 的正则表达式来识别字段,但没有返回从结果中提取 substring。 Is this possible with awk or some other command?这可能与 awk 或其他命令有关吗?

One awk idea that combines the current code with the new requirement:一个awk想法将当前代码与新需求相结合:

awk -v s1="String1" -v s2="string2" '                               # feed both search strings in as awk variables "s1" and "s2"
$0~s1 && $0~s2 { print $1,substr($15,2,index($15,"]")-2) }          # if s1 and s2 are both present in the current line then print 1st field and 15th field (sans the "[" "]" wrappers)
' /path/to/file.txt 

A non-sensical demo file:一个无意义的演示文件:

$ cat file.txt
a b c d e f g h i j k l m n o p q r s t u v w x y z
a string2 c d e f g h i j k l m n [old]> p q r s t u v String1 x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
a String1 c d e f g h i j k l m n [older]> p q r s t u v string2 x y z

Running the awk script against this file generates:针对此文件运行awk脚本会生成:

a old
a older

Another option removing the leading [ and trailing ]> with gsub and an alternation:使用 gsub 和交替删除前导[和尾随]>的另一种选择:

awk '/String1/ && /string2/ {
  gsub(/^\[|\]>$/, "", $15)
  {print $1, $15}
}' file.txt

In gnu-awk you could use gensub :gnu-awk你可以使用gensub

awk '/String1/ && /string2/ {
  {print $1, gensub(/^\[|\]>$/, "", "g", $15)}
}' file

Or find the occurrence of the string using index:或者使用索引查找字符串的出现:

awk 'index($0, "String1") && index($0, "string2"){
  gsub(/^\[|\]>$/, "", $15)
  {print $1, $15}
}' file

If you're just basically want to delete the characters [ , ] and > , you can simply use tr -d for this, something like:如果您基本上只是想删除字符[ , ]> ,您可以简单地使用tr -d ,例如:

... | tr -d "[]>"

Linux prompt>echo "2023-01-20 [text1]>" | tr -d "[]>"
2023-01-20 text1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM