简体   繁体   English

使用包含特殊字符(例如“\”、“:”、“;”)的其他文件列表过滤文件在 Linux

[英]Filter a file with a list of other files containing special characters such as “\”, “:”, “;” in Linux

I'm trying to filter a file where the file I want to filter is the mygeneralfile.txt and my file where the filters are has the following name myfilterfile.txt The contents of both files are as follows:我正在尝试过滤我要过滤的文件是mygeneralfile.txt的文件,而过滤器所在的文件具有以下名称myfilterfile.txt这两个文件的内容如下:

employee@EMPLOYEE-PC:~$ cat myfilterfile.txt
2020-06-24 00:00:04,396
2020-06-24 00:00:04,510
2020-06-24 00:00:04,511
employee@EMPLOYEE-PC:~$
employee@EMPLOYEE-PC:~$ cat mygeneralfile.txt
[2020-06-24 00:00:04,265][] [INFO] [com.mycompany.library] |getanotherTableImportant USERABCDEFG
[2020-06-24 00:00:04,311][] [INFO] [com.mycompany.library] |getanotherTableImportant null
[2020-06-24 00:00:04,314][] [INFO] [com.mycompany.library] |getanotherTableImportant USER_NUMBER_TWO_1234567
[2020-06-24 00:00:04,396][] [INFO] [com.mycompany.library] |getanotherTableImportant BILLABONG_MASTER_USER
[2020-06-24 00:00:04,510][] [INFO] [com.mycompany.library] |getanotherTableImportant NINET_USER_350
[2020-06-24 00:00:04,511][] [INFO] [com.mycompany.library] |getanotherTableImportant USERABCDEFG
[2020-06-24 00:00:04,527][] [INFO] [com.mycompany.library] |getanotherTableImportant USERABCDEFG

The result I want to obtain is the following:我想要获得的结果如下:

[2020-06-24 00:00:04,396][] [INFO] [com.mycompany] |getanotherTableImportant BILLABONG_MASTER_USER
[2020-06-24 00:00:04,510][] [INFO] [com.mycompany] |getanotherTableImportant NINET_USER_350
[2020-06-24 00:00:04,511][] [INFO] [com.mycompany] |getanotherTableImportant USERABCDEFG

and if it just show me these lines it would be much better如果它只是向我展示这些线条会更好

BILLABONG_MASTER_USER
NINET_USER_350
USERABCDEFG

I've seen that the awk command is useful for these cases and try to replicate it using the following command: awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt我已经看到awk命令对这些情况很有用,并尝试使用以下命令复制它: awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt , however it does not generate any output for me and I think it's because inside my myfilterfile.txt file it contains special characters such as - : , awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt ,但是它不会为我生成任何 output 我认为这是因为在我的myfilterfile.txt文件中它包含特殊字符,例如- : ,

PS: The mygeneralfile.txt file weighs about 2GB and that's why I'm trying the awk command as it's faster than the grep command. PS: mygeneralfile.txt文件重约 2GB,这就是我尝试 awk 命令的原因,因为它比grep命令快。 I have little knowledge with the awk command so I would like you to detail every function that is being used to solve this problem.我对awk命令知之甚少,因此我希望您详细说明用于解决此问题的每个 function。

Thank you very much Community!!!非常感谢社区!!!

With GNU grep , you can try the following command:使用 GNU grep ,您可以尝试以下命令:

# grep -F -f myfilterfile.txt mygeneralfile.txt

The patterns are searched through the whole line.通过整行搜索模式。 I you are paranoid and want to limit the search to the date-time field, try this slower version:如果您是偏执狂,并且想将搜索限制在日期时间字段,请尝试这个较慢的版本:

# sed 's/^/^./' myfilterfile.txt | grep -f - mygeneralfile.txt

I doubt you will get a faster awk script.我怀疑你会得到一个更快的awk脚本。

$ awk -F'[][ ]' 'NR==FNR{a[$0]; next} ($2" "$3) in a{print $NF}' myfilterfile.txt mygeneralfile.txt
BILLABONG_MASTER_USER
NINET_USER_350
USERABCDEFG

wrt I think it's because inside my myfilterfile.txt file it contains special characters such as -: , - no, it's not. wrt I think it's because inside my myfilterfile.txt file it contains special characters such as -: , - 不,不是。 None of those characters are special even in a regexp (except - if it was in the middle of a bracket expression within a regexp) and your awk command is doing string comparison anyway so even if they were regexp metachars that wouldn't apply in this case.即使在正则表达式中,这些字符也不是special的(除非-如果它位于正则表达式中的括号表达式的中间)并且您的 awk 命令无论如何都在进行字符串比较,所以即使它们是不适用的正则表达式元字符案子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM