简体   繁体   English

如何使用grep解析出csv中的列

[英]how to use grep to parse out columns in csv

I have a log with millions of lines that like this 我有一个包含数百万行的日志,就像这样

1482364800 bunch of stuff 172.169.49.138 252377 + many other things
1482364808 bunch of stuff 128.169.49.111 131177 + many other things 
1482364810 bunch of stuff 2001:db8:0:0:0:0:2:1 124322 + many other things
1482364900 bunch of stuff 128.169.49.112 849231 + many other things
1482364940 bunch of stuff 128.169.49.218 623423 + many other things

Its so big that I can't really read it into memory for python to parse so i want to zgrep out only the items I need into another smaller file but Im not very good with grep. 它是如此之大,以至于我无法真正将其读入内存以供python解析,因此我只想将我需要的项目zgrep导出到另一个较小的文件中,但是我对grep不太满意。 In python I would normally open.gzip(log.gz) then pull out data[0],data[4],data[5]to a new file so my new file only has the epoc and ip and date(the ip can be ipv6 or 4) 在python中,我通常会打开.gzip(log.gz),然后将data [0],data [4],data [5]拉到一个新文件中,这样我的新文件中只有epoc和ip以及date(ip可以是ipv6或4)

expected result of the new file: 新文件的预期结果:

1482364800 172.169.49.138 252377
1482364808 128.169.49.111 131177  
1482364810 2001:db8:0:0:0:0:2:1 124322 
1482364900 128.169.49.112 849231 
1482364940 128.169.49.218 623423 

How do I do this zgrep? 我该怎么做zgrep?

Thanks 谢谢

To select columns you have to use cut command zgrep/grep select lines so you can use cut command like this 要选择列,您必须使用剪切命令zgrep / grep选择行,以便可以使用像这样的剪切命令

cut -d' ' -f1,2,4 切-d''-f1,2,4

in this exemple I get the columns 1 2 and 4 with space ' ' as a delimiter of the columns yous should know that -f option is used to specify numbers of columns and -d for the delimiter. 在此示例中,我得到的第1 2和4列以空格''作为列的定界符,您应该知道-f选项用于指定列数,而-d用于定界符。

I hope that I have answered your question 我希望我已经回答了你的问题

I'm on OSX and maybe that is the issue but I couldnt get zgrep to work in filtering out columns. 我在OSX上,也许是问题所在,但我无法让zgrep来过滤列。 and zcat kept added a .Z at the end of the .gz. 和zcat在.gz的末尾继续添加.Z。 Here's what I ended up doing: 我最终要做的是:

awk '{print $1,$3,$4}' <(gzip -dc /path/to/source/Largefile.log.gz) | gzip > /path/to/output/Smallfile.log.gz

This let me filter out the 3 columns I needed from the Largefile to a Smallfile while keeping both the source and destination in compressed format. 这样,我就可以将我需要的3列从Largefile过滤为Smallfile,同时将源和目标保持为压缩格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM