[英]Why won't zgrep show the actual matches?
Let's say I have two files a.txt
and b.txt
with some content...假设我有两个文件
a.txt
和b.txt
有一些内容......
$ tail *.txt
==> a.txt <==
ABC
CDE
123
C
==> b.txt <==
C
321
EDC
CBA
Let's also imagine that the files have now been put in a gzipped tarball...让我们也想象一下,这些文件现在已经被放入一个 gzip 压缩包中......
$ tar -czf tarball.tgz *.txt
$ tar -tf tarball.tgz
a.txt
b.txt
Now, I want to grep through the files in the tarball.现在,我想通过 tarball 中的文件来 grep。 Seeing the original file-name and line-number before the match would be nice, but I most importantly want to see the matched lines.
在比赛前看到原始文件名和行号会很好,但我最重要的是想看到匹配的行。
First, I expected that zgrep 'pattern' tarball.tgz
would simply work.首先,我预计
zgrep 'pattern' tarball.tgz
会简单地工作。 It does tell me whether there is a match, it can even count them, but I can't find a way to have the matches printed...它确实告诉我是否有匹配,它甚至可以计算它们,但我找不到打印匹配的方法......
$ zgrep 'AB' tarball.tgz
Binary file (standard input) matches
$ zgrep 'C' tarball.tgz
Binary file (standard input) matches
$ zgrep -c 'AB' tarball.tgz
1
$ zgrep -c 'C' tarball.tgz
6
Second, I thought to zcat
the tarball and use a regular grep on that.其次,我想对压缩包进行
zcat
并在其上使用常规的 grep。 But still, I get this exact same "Binary file (standard input) matches" message...但是,我仍然收到完全相同的“二进制文件(标准输入)匹配”消息...
$ zcat tarball.tgz | grep 'C'
Binary file (standard input) matches
I guess zcat
(and zgrep
) do a gunzip
but no tar -xf
?我猜
zcat
(和zgrep
)做了一个gunzip
但没有tar -xf
? If I look at zcat
I can see the same output as if I had just done tar -c
...如果我查看
zcat
,我可以看到相同的 output 就好像我刚刚完成了tar -c
...
$ zcat tarball.tgz
a.txt0000664�3���3���0000000001613554050266013370 0ustar useruserABC
CDE
123
C
b.txt0000664�3���3���0000000001613554050301013357 0ustar useruserC
321
EDC
CBA
$ tar -c *.txt
a.txt0000664�3���3���0000000001613554050266013370 0ustar useruserABC
CDE
123
C
b.txt0000664�3���3���0000000001613554050301013357 0ustar useruserC
321
EDC
CBA
So finally, I got to this solution which works OK:所以最后,我得到了这个工作正常的解决方案:
$ tar -xOzf tarball.tgz | grep 'C'
ABC
CDE
C
C
EDC
CBA
Of course, if I now ask for filenames and line-numbers, I don't get anything useful...当然,如果我现在询问文件名和行号,我没有得到任何有用的信息......
$ tar -xOzf tarball.tgz | grep -Hn 'C'
(standard input):1:ABC
(standard input):2:CDE
(standard input):4:C
(standard input):5:C
(standard input):7:EDC
(standard input):8:CBA
The only way I can think of, to get the results I want, would involve a bit more scripting to extract the tarball and run grep
in a loop...我能想到的唯一方法是获得我想要的结果,需要更多的脚本来提取压缩包并循环运行
grep
...
Is there a nice (easy and concise) way to do this?有没有一种很好(简单而简洁)的方法来做到这一点?
tar -czf
does two things: tar -czf
做了两件事:
As I was suspecting, zgrep
or zcat
will only do a gunzip
, and be left with a tar file which is still binary.正如我所怀疑的那样,
zgrep
或zcat
只会做一个gunzip
,并留下一个仍然是二进制的 tar 文件。 That explains all the output I was getting.这解释了我得到的所有 output。
The easiest way around that is to add an option to zgrep
:最简单的方法是向
zgrep
添加一个选项:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
That will work almost as good as tar -xOzf tarball.tgz | grep -Hn 'C'
这几乎和
tar -xOzf tarball.tgz | grep -Hn 'C'
tar -xOzf tarball.tgz | grep -Hn 'C'
, where we don't get the individual filenames, and the line-numbers are over the whole tar output. tar -xOzf tarball.tgz | grep -Hn 'C'
,我们没有得到单个文件名,行号在整个 tar output 上。 We also get some noise, namely the tar
format:我们也得到了一些噪音,即
tar
格式:
$ zgrep -Hna 'C' tarball.tgz
tarball.tgz:1:a.txt0000664�3���3���0000000001613554050266013370 0ustar jlehuenjlehuenABC
tarball.tgz:2:CDE
tarball.tgz:4:C
tarball.tgz:5:b.txt0000664�3���3���0000000001613554050301013357 0ustar jlehuenjlehuenC
tarball.tgz:7:EDC
tarball.tgz:8:CBA
That is easy enough to remember, and works quite well for eg grepping logs where the first line of the files is rarely the interesting matches.这很容易记住,并且对于例如文件的第一行很少是有趣匹配的 grepping 日志非常有效。
Now, @Shawn pointed me to that answer on the Unix StackExchange.现在,@Shawn 向我指出了 Unix StackExchange 上的答案。 From that, I could come up to my favorite option:
由此,我可以提出我最喜欢的选择:
$ tar -xf tarball.tgz --to-command='grep -Hn --label="$TAR_ARCHIVE/$TAR_FILENAME" C || true'
tarball.tgz/a.txt:1:ABC
tarball.tgz/a.txt:2:CDE
tarball.tgz/a.txt:4:C
tarball.tgz/b.txt:1:C
tarball.tgz/b.txt:3:EDC
tarball.tgz/b.txt:4:CBA
I'll probably create myself some function for this, because it's not fun to type.我可能会为此创建一些 function,因为打字并不有趣。 The output is exactly what I wanted, though: :)
不过,output 正是我想要的::)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.