简体   繁体   English

为什么 zgrep 不显示实际匹配项?

[英]Why won't zgrep show the actual matches?

Context语境

Let's say I have two files a.txt and b.txt with some content...假设我有两个文件a.txtb.txt有一些内容......

$ tail *.txt
==> a.txt <==
ABC
CDE
123
C

==> b.txt <==
C
321
EDC
CBA

Let's also imagine that the files have now been put in a gzipped tarball...让我们也想象一下,这些文件现在已经被放入一个 gzip 压缩包中......

$ tar -czf tarball.tgz *.txt
$ tar -tf tarball.tgz
a.txt
b.txt

Goal目标

Now, I want to grep through the files in the tarball.现在,我想通过 tarball 中的文件来 grep。 Seeing the original file-name and line-number before the match would be nice, but I most importantly want to see the matched lines.在比赛前看到原始文件名和行号会很好,但我最重要的是想看到匹配的行。

What did I try?我尝试了什么?

First, I expected that zgrep 'pattern' tarball.tgz would simply work.首先,我预计zgrep 'pattern' tarball.tgz会简单地工作。 It does tell me whether there is a match, it can even count them, but I can't find a way to have the matches printed...它确实告诉我是否有匹配,它甚至可以计算它们,但我找不到打印匹配的方法......

$ zgrep 'AB' tarball.tgz
Binary file (standard input) matches
$ zgrep 'C' tarball.tgz
Binary file (standard input) matches
$ zgrep -c 'AB' tarball.tgz
1
$ zgrep -c 'C' tarball.tgz
6

Second, I thought to zcat the tarball and use a regular grep on that.其次,我想对压缩包进行zcat并在其上使用常规的 grep。 But still, I get this exact same "Binary file (standard input) matches" message...但是,我仍然收到完全相同的“二进制文件(标准输入)匹配”消息...

$ zcat tarball.tgz | grep 'C'
Binary file (standard input) matches

I guess zcat (and zgrep ) do a gunzip but no tar -xf ?我猜zcat (和zgrep )做了一个gunzip但没有tar -xf If I look at zcat I can see the same output as if I had just done tar -c ...如果我查看zcat ,我可以看到相同的 output 就好像我刚刚完成了tar -c ...

$ zcat tarball.tgz
a.txt0000664�3���3���0000000001613554050266013370 0ustar  useruserABC
CDE
123
C
b.txt0000664�3���3���0000000001613554050301013357 0ustar  useruserC
321
EDC
CBA

$ tar -c *.txt
a.txt0000664�3���3���0000000001613554050266013370 0ustar  useruserABC
CDE
123
C
b.txt0000664�3���3���0000000001613554050301013357 0ustar  useruserC
321
EDC
CBA

So finally, I got to this solution which works OK:所以最后,我得到了这个工作正常的解决方案:

$ tar -xOzf tarball.tgz | grep 'C'
ABC
CDE
C
C
EDC
CBA

Of course, if I now ask for filenames and line-numbers, I don't get anything useful...当然,如果我现在询问文件名和行号,我没有得到任何有用的信息......

$ tar -xOzf tarball.tgz | grep -Hn 'C'
(standard input):1:ABC
(standard input):2:CDE
(standard input):4:C
(standard input):5:C
(standard input):7:EDC
(standard input):8:CBA

The only way I can think of, to get the results I want, would involve a bit more scripting to extract the tarball and run grep in a loop...我能想到的唯一方法是获得我想要的结果,需要更多的脚本来提取压缩包并循环运行grep ...


Is there a nice (easy and concise) way to do this?有没有一种很好(简单而简洁)的方法来做到这一点?

tar -czf does two things: tar -czf做了两件事:

  • packages all files (which happen to be text only in my example) into a tar file (which is binary);将所有文件(在我的示例中恰好是文本)打包成一个 tar 文件(它是二进制文件);
  • gzips that tar file into a gzipped tar file.将该 tar 文件 gzip 到 gzip 压缩的 tar 文件中。

As I was suspecting, zgrep or zcat will only do a gunzip , and be left with a tar file which is still binary.正如我所怀疑的那样, zgrepzcat只会做一个gunzip ,并留下一个仍然是二进制的 tar 文件。 That explains all the output I was getting.这解释了我得到的所有 output。

Easy solution简单的解决方案

The easiest way around that is to add an option to zgrep :最简单的方法是向zgrep添加一个选项:

   -a, --text
          Process a binary file as if it were text; this is equivalent to the --binary-files=text option.

That will work almost as good as tar -xOzf tarball.tgz | grep -Hn 'C'这几乎和tar -xOzf tarball.tgz | grep -Hn 'C' tar -xOzf tarball.tgz | grep -Hn 'C' , where we don't get the individual filenames, and the line-numbers are over the whole tar output. tar -xOzf tarball.tgz | grep -Hn 'C' ,我们没有得到单个文件名,行号在整个 tar output 上。 We also get some noise, namely the tar format:我们也得到了一些噪音,即tar格式:

$ zgrep -Hna 'C' tarball.tgz
tarball.tgz:1:a.txt0000664�3���3���0000000001613554050266013370 0ustar  jlehuenjlehuenABC
tarball.tgz:2:CDE
tarball.tgz:4:C
tarball.tgz:5:b.txt0000664�3���3���0000000001613554050301013357 0ustar  jlehuenjlehuenC
tarball.tgz:7:EDC
tarball.tgz:8:CBA

That is easy enough to remember, and works quite well for eg grepping logs where the first line of the files is rarely the interesting matches.这很容易记住,并且对于例如文件的第一行很少是有趣匹配的 grepping 日志非常有效。

Best output最佳 output

Now, @Shawn pointed me to that answer on the Unix StackExchange.现在,@Shawn 向我指出了 Unix StackExchange 上的答案 From that, I could come up to my favorite option:由此,我可以提出我最喜欢的选择:

$ tar -xf tarball.tgz --to-command='grep -Hn --label="$TAR_ARCHIVE/$TAR_FILENAME" C || true'
tarball.tgz/a.txt:1:ABC
tarball.tgz/a.txt:2:CDE
tarball.tgz/a.txt:4:C
tarball.tgz/b.txt:1:C
tarball.tgz/b.txt:3:EDC
tarball.tgz/b.txt:4:CBA

I'll probably create myself some function for this, because it's not fun to type.我可能会为此创建一些 function,因为打字并不有趣。 The output is exactly what I wanted, though: :)不过,output 正是我想要的::)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM