[英]Matching third field in a CSV with pattern file in GNU Linux (AWK/SED/GREP)
I need to print all the lines in a CSV file when 3rd field matches a pattern in a pattern file. 当第三个字段与模式文件中的模式匹配时,我需要打印CSV文件中的所有行。
I have tried grep with no luck because it matches with any field not only the third. 我尝试过grep时没有碰运气,因为它与任何字段都匹配,而不仅仅是第三个。
grep -f FILE2 FILE1 > OUTPUT
FILE1 文件1
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567
asdasd,0,85249,1,lkjiou,52874
dasdas,1,48555,0,gfdkjh,06793
sadsad,0,98745,1,gfdkjh,45346
asdasd,1,56321,0,gfdkjh,47832
FILE2 文件2
00567
98745
45486
54543
48349
96349
56485
19615
56496
39493
RIGHT OUTPUT 正确的输出
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
WRONG OUTPUT 错误的输出
dasdas,0,00567,1,lkjiou,85249
sadsad,1,52874,0,lkjiou,00567 <---- I don't want this to appear
sadsad,0,98745,1,gfdkjh,45346
I have already searched everywhere and tried different formulas. 我已经搜索了各处,并尝试了不同的公式。
EDIT: thanks to Wintermute, I managed to write something like this: 编辑:感谢Wintermute,我设法写了这样的东西:
csvquote file1.csv > file1.csv
awk -F '"' 'FNR == NR { patterns[$0] = 1; next } patterns[$6]' file2.csv file1.csv | csvquote -u > result.csv
Csvquote helps parsing CSV files with AWK. Csvquote帮助使用AWK解析CSV文件。
Thank you very much everybody, great community! 非常感谢大家,伟大的社区!
With awk: 使用awk:
awk -F, 'FNR == NR { patterns[$0] = 1; next } patterns[$3]' file2 file1
This works as follows: 其工作原理如下:
FNR == NR { # when processing the first file (the pattern file)
patterns[$0] = 1 # remember the patterns
next # and do nothing else
}
patterns[$3] # after that, select lines whose third field
# has been seen in the patterns.
Using grep
and sed
: 使用
grep
和sed
:
grep -f <( sed -e 's/^\|$/,/g' file2) file1
dasdas,0,00567,1,lkjiou,85249
sadsad,0,98745,1,gfdkjh,45346
Explanation: 说明:
We insert a coma at the beginning and at the end of file2, but without changing the file, then we just grep as you were already doing. 我们在文件2的开头和结尾插入一个逗号,但是不更改文件,那么就像您已经做的那样,我们只是grep。
This can be a start 这可以是一个开始
for i in $(cat FILE2);do cat FILE1| 对于$(cat FILE2)中的i;执行cat FILE1 | cut -d',' -f3|grep $i ;done
cut -d','-f3 | grep $ i;完成
sed 's#.*#/^[^,]*,[^,]*,&,/!d#' File2 >/tmp/File2.sed && sed -f /tmp/File2.sed FILE1;rm /tmp/File2.sed
hard in a simple sed like awk can do but should work if awk is not available 像awk一样可以在简单的sed中完成,但是如果awk不可用,则应该可以工作
same with egrep (usefull on huge file) 与egrep相同(在大文件上使用usefull)
sed 's#.*#^[^,]*,[^,]*,&,#' File2 >/tmp/File2.egrep && egrep -f /tmp/File2.egrep FILE1;rm /tmp/File2.egrep
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.