[英]Filter a file with a list of other files containing special characters such as “\”, “:”, “;” in Linux
I'm trying to filter a file where the file I want to filter is the mygeneralfile.txt
and my file where the filters are has the following name myfilterfile.txt
The contents of both files are as follows:我正在尝试过滤我要过滤的文件是mygeneralfile.txt
的文件,而过滤器所在的文件具有以下名称myfilterfile.txt
这两个文件的内容如下:
employee@EMPLOYEE-PC:~$ cat myfilterfile.txt
2020-06-24 00:00:04,396
2020-06-24 00:00:04,510
2020-06-24 00:00:04,511
employee@EMPLOYEE-PC:~$
employee@EMPLOYEE-PC:~$ cat mygeneralfile.txt
[2020-06-24 00:00:04,265][] [INFO] [com.mycompany.library] |getanotherTableImportant USERABCDEFG
[2020-06-24 00:00:04,311][] [INFO] [com.mycompany.library] |getanotherTableImportant null
[2020-06-24 00:00:04,314][] [INFO] [com.mycompany.library] |getanotherTableImportant USER_NUMBER_TWO_1234567
[2020-06-24 00:00:04,396][] [INFO] [com.mycompany.library] |getanotherTableImportant BILLABONG_MASTER_USER
[2020-06-24 00:00:04,510][] [INFO] [com.mycompany.library] |getanotherTableImportant NINET_USER_350
[2020-06-24 00:00:04,511][] [INFO] [com.mycompany.library] |getanotherTableImportant USERABCDEFG
[2020-06-24 00:00:04,527][] [INFO] [com.mycompany.library] |getanotherTableImportant USERABCDEFG
The result I want to obtain is the following:我想要获得的结果如下:
[2020-06-24 00:00:04,396][] [INFO] [com.mycompany] |getanotherTableImportant BILLABONG_MASTER_USER
[2020-06-24 00:00:04,510][] [INFO] [com.mycompany] |getanotherTableImportant NINET_USER_350
[2020-06-24 00:00:04,511][] [INFO] [com.mycompany] |getanotherTableImportant USERABCDEFG
and if it just show me these lines it would be much better如果它只是向我展示这些线条会更好
BILLABONG_MASTER_USER
NINET_USER_350
USERABCDEFG
I've seen that the awk
command is useful for these cases and try to replicate it using the following command: awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt
我已经看到awk
命令对这些情况很有用,并尝试使用以下命令复制它: awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt
awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt
, however it does not generate any output for me and I think it's because inside my myfilterfile.txt
file it contains special characters such as -
:
,
awk 'FNR==NR { a [$NF]; next } ($NF in a)' myfilterfile.txt mygeneralfile.txt
,但是它不会为我生成任何 output 我认为这是因为在我的myfilterfile.txt
文件中它包含特殊字符,例如-
:
,
PS: The mygeneralfile.txt
file weighs about 2GB and that's why I'm trying the awk command as it's faster than the grep
command. PS: mygeneralfile.txt
文件重约 2GB,这就是我尝试 awk 命令的原因,因为它比grep
命令快。 I have little knowledge with the awk
command so I would like you to detail every function that is being used to solve this problem.我对awk
命令知之甚少,因此我希望您详细说明用于解决此问题的每个 function。
Thank you very much Community!!!非常感谢社区!!!
With GNU grep
, you can try the following command:使用 GNU grep
,您可以尝试以下命令:
# grep -F -f myfilterfile.txt mygeneralfile.txt
The patterns are searched through the whole line.通过整行搜索模式。 I you are paranoid and want to limit the search to the date-time field, try this slower version:如果您是偏执狂,并且想将搜索限制在日期时间字段,请尝试这个较慢的版本:
# sed 's/^/^./' myfilterfile.txt | grep -f - mygeneralfile.txt
I doubt you will get a faster awk
script.我怀疑你会得到一个更快的awk
脚本。
$ awk -F'[][ ]' 'NR==FNR{a[$0]; next} ($2" "$3) in a{print $NF}' myfilterfile.txt mygeneralfile.txt
BILLABONG_MASTER_USER
NINET_USER_350
USERABCDEFG
wrt I think it's because inside my myfilterfile.txt file it contains special characters such as -: ,
- no, it's not. wrt I think it's because inside my myfilterfile.txt file it contains special characters such as -: ,
- 不,不是。 None of those characters are special
even in a regexp (except -
if it was in the middle of a bracket expression within a regexp) and your awk command is doing string comparison anyway so even if they were regexp metachars that wouldn't apply in this case.即使在正则表达式中,这些字符也不是special
的(除非-
如果它位于正则表达式中的括号表达式的中间)并且您的 awk 命令无论如何都在进行字符串比较,所以即使它们是不适用的正则表达式元字符案子。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.