简体   繁体   English

查找在其他行上的其他两个字符串之间的所有字符串实例

[英]Find all instances of string between two other strings that are on other lines

So I feel like I should know how to do this but I can't quite get it. 所以我觉得我应该知道该怎么做,但我做不到。

I'm trying to find all instances (in all files) where a string that ends with _START exists between two strings (that are normally on other lines) @GROUP and @END_GROUP 我正在尝试查找所有实例(在所有文件中),其中两个字符串之间(通常在其他行上)@GROUP和@END_GROUP中存在以_START结尾的字符串

So there might be some code like this 所以可能会有这样的代码

// @GROUP GroupName OtherStuff
#define GROUPNAME_START 1
#define GROUPNAME_FOO 2
.... (more defines)
#define GROUPNAME_END 10
// @END_GROUP

#define GROUPTWO_START 1
// @GROUP GroupTwo MoreStuff
#define GROUPTWO_FOO 2
.... (some defines)
#define GROUPTWO_BAR 70
// @END_GROUP

And I would want to match the first group (really just the line with _START, but everything would be ok) but not the second group or the _START line that is outside of the @GROUP comments. 我想匹配第一个组(实际上只匹配_START行,但一切正常),但不匹配第二个组或@GROUP注释之外的_START行。

I figure using grep for this would be the best way to search through all the files, but I can't quite get the regex needed. 我认为为此使用grep将是搜索所有文件的最佳方法,但是我不能完全获得所需的正则表达式。 Thanks for the help. 谢谢您的帮助。

edit: My bad for not making it clear that I want to be able to search through files in multiple directories at the same time, doing the same as a grep -r "foo" * . 编辑:我的缺点是无法明确表示我希望能够同时搜索多个目录中的文件,就像grep -r“ foo” *一样。 Answers have been good, I just didn't make that clear. 答案很好,我只是没有说清楚。

edit2: Multiple great answers each solved it in a slightly different way and I really don't know which one would be best. edit2:多个很好的答案每个都以略有不同的方式解决了它,我真的不知道哪个是最好的。 I marked the one who responded first, but anyone looking at this should be sure to check out all the answers, one might be better for your problem. 我标记了第一个回答的人,但是任何关注此问题的人都应该确保检查出所有答案,一个可能会更好地解决您的问题。

grep only sees one line, so it doesn't know whether it's between the group comments or not. grep只看到一行,因此它不知道它是否在组注释之间。 sed can use addresses, though: sed可以使用地址,但是:

sed '/@GROUP/,/@END_GROUP/!d' input_file | grep '_START'

! negates the addresses, d deletes a line, ie we're telling sed to remove lines that are not between the group comments. 取反地址, d删除一行,即我们告诉sed删除不在组注释之间的行。 grep then operates only on the "interesting" lines. 然后grep仅在“有趣”行上运行。

To make it work for subdirectories, too, add find to the toolbox: 要使其也适用于子目录,请在工具箱中添加find

find /path/to/dir -type f -exec sed '/@GROUP/,/@END_GROUP/!d' {} + | grep '_START'

Or, if the group comment could appear without the corresponding END, use a slower but safer 或者,如果组注释可能没有相应的END出现,请使用较慢但更安全的方法

find /path/to/dir -type f -exec sed '/@GROUP/,/@END_GROUP/!d' {} \; | grep '_START'

Or, let xargs operate on the output of grep -l : 或者,让xargsgrep -l的输出进行操作:

grep -lr @GROUP /path/to/dir | xargs sed '/@GROUP/,/@END_GROUP/!d' | grep '_START'

Note: If your filenames contain spaces, it wouldn't work. 注意:如果文件名包含空格,则无法使用。

With awk you can use null RS and do all that in single search: 使用awk您可以使用空RS并在单个搜索中完成所有操作:

awk -v RS= '/@GROUP.*_START.*@END_GROUP/' file
// @GROUP GroupName OtherStuff
#define GROUPNAME_START 1
#define GROUPNAME_FOO 2
.... (more defines)
#define GROUPNAME_END 10
// @END_GROUP

This is a job for sed , using its address syntax: 这是sed的工作,使用其地址语法:

#!/bin/sed -f

/@GROUP/h  # store the @GROUP line

/@GROUP/,/@END_GROUP/{
/_START/{
g  # retrieve the @GROUP line
n  # print it and continue
}
}

# otherwise, delete the line and continue
d

It's a little bit complicated by the nested blocks, but what this does is: within @GROUP .. @END_GROUP , then for any line matching _START it will print the previously found @GROUP line thus (using your example): 嵌套的块有点复杂,但是它的作用是:在@GROUP .. @END_GROUP ,然后对于匹配_START任何行,它将打印先前找到的@GROUP行(因此,使用您的示例):

$ ./group.sed group.data 
// @GROUP GroupName OtherStuff

Is that what you're trying to achieve? 这就是您要达到的目标吗?

Edit : It's not what you asked for - you just want the _START line, not the @GROUP line. 编辑 :这不是您要的-您只需要_START行,而不是@GROUP行。 Well that's much easier: 嗯,这要容易得多:

#!/bin/sed -nf
/@GROUP/,/@END_GROUP/{
/_START/p
}

Addendum : Since you now ask for recursive directory searching, you can use find as described in other answers: 附录 :由于您现在要求递归目录搜索,因此可以按照其他答案中的描述使用find

find . -type f -print0 | xargs -0 ./group.sed --separate

(I've used the GNU sed --separate argument here to protect against any file having the group start but missing the group end line). (我在这里使用了GNU sed --separate参数来防止任何文件以组开头但缺少组结束行)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM