简体   繁体   English

在GNU Linux(AWK / SED / GREP)中将CSV中的第三字段与模式文件匹配

[英]Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)

I need to print all the lines in a CSV file when 3rd field matches a pattern in a pattern file. 当第三个字段与模式文件中的模式匹配时,我需要打印CSV文件中的所有行。

I have tried grep with no luck because it matches with any field not only the third. 我尝试过grep时没有碰运气,因为它与任何字段都匹配,而不仅仅是第三个。

grep -f FILE2 FILE1 > OUTPUT

FILE1 文件1

dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832

FILE2 文件2

00567
98745
45486
54543
48349
96349
56485
19615
56496
39493

RIGHT OUTPUT 正确的输出

dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346

WRONG OUTPUT 错误的输出

dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567   <---- I don't want this to appear
sadsad,0,98745,1,gfdkjh,45346

I have already searched everywhere and tried different formulas. 我已经搜索了各处,并尝试了不同的公式。

EDIT: thanks to Wintermute, I managed to write something like this: 编辑:感谢Wintermute,我设法写了这样的东西:

csvquote file1.csv > file1.csv
awk -F '"' 'FNR == NR { patterns[$0] = 1; next } patterns[$6]' file2.csv file1.csv | csvquote -u > result.csv

Csvquote helps parsing CSV files with AWK. Csvquote帮助使用AWK解析CSV文件。

Thank you very much everybody, great community! 非常感谢大家,伟大的社区!

With awk: 使用awk:

awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' file2 file1

This works as follows: 其工作原理如下:

FNR == NR {           # when processing the first file (the pattern file)
  patterns[$0] = 1    # remember the patterns
  next                # and do nothing else
}
patterns[$3]          # after that, select lines whose third field
                      # has been seen in the patterns.

Using grep and sed : 使用grepsed

grep -f <( sed -e 's/^\|$/,/g' file2) file1
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346

Explanation: 说明:

We insert a coma at the beginning and at the end of file2, but without changing the file, then we just grep as you were already doing. 我们在文件2的开头和结尾插入一个逗号,但是不更改文件,那么就像您已经做的那样,我们只是grep。

This can be a start 这可以是一个开始

for i in $(cat FILE2);do cat FILE1| 对于$(cat FILE2)中的i;执行cat FILE1 | cut -d',' -f3|grep $i ;done cut -d','-f3 | grep $ i;完成

sed 's#.*#/^[^,]*,[^,]*,&,/!d#' File2 >/tmp/File2.sed && sed -f /tmp/File2.sed FILE1;rm /tmp/File2.sed

hard in a simple sed like awk can do but should work if awk is not available 像awk一样可以在简单的sed中完成,但是如果awk不可用,则应该可以工作

same with egrep (usefull on huge file) 与egrep相同(在大文件上使用usefull)

sed 's#.*#^[^,]*,[^,]*,&,#' File2 >/tmp/File2.egrep && egrep -f /tmp/File2.egrep FILE1;rm /tmp/File2.egrep

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM