[英]bash /bin/grep: Argument list too long (using --file option)
I have a text file containing 33.869 rows and I have to filter 30.067 of them. 我有一个包含33.869行的文本文件,我必须过滤30.067行。
With an example: 举个例子:
File: input.txt (csv like with 33.869 rows) 文件: input.txt (csv与33.869行一样)
#00001:A123456.10.101.102,first,row,value2,1
#00002:A123456.10.101.103,second,row,value7,85
(omissis)
#33869:A123456.25.170.180,last,test,value9,0
File: filter.txt (list of values separated by "\\n" with 30.067 rows) 文件: filter.txt (由“\\ n”以30.067行分隔的值列表)
A123456.10.101.102
A123456.10.101.103
(omissis)
A123456.24.150.115
(expected) Output file: output.txt (csv like with 30.067 rows taken from input.txt): (预期)输出文件: output.txt (csv与从input.txt获取的30.067行):
#00001:A123456.10.101.102,first,row,value2,1
#00002:A123456.10.101.103,second,row,value7,85
(omissis)
#30067:A123456.24.150.115,whatever,x,y,99
The command I'm using is: 我正在使用的命令是:
#!/bin/bash
/bin/grep --file="filter.txt" input.txt > output.txt
but error returned is 但是返回的错误是
/bin/grep: Argument list too long
Am I forced to split "filter.txt" in smaller chunk? 我是否被迫在较小的块中拆分“filter.txt”?
Which is the limit allowed? 允许的限制是多少?
I did not find the limit on man code
command. 我没有找到man code
命令的限制。
If there are no regular expressions in the input file, you should switch to grep -F
which can read a significantly larger number of input records. 如果输入文件中没有正则表达式,则应切换到grep -F
,它可以读取大量输入记录。
Failing that, splitting the input file would be hugely more efficient than running 30,000+ iterations of grep
over the same file. 如果不这样做,拆分输入文件比在同一个文件上运行30,000多次grep
迭代要高效得多。
Here's splitting in chunks of 10,000 lines; 这里分为10,000行; adapting to a different factor should be trivial. 适应不同的因素应该是微不足道的。
#!/bin/sh
t=$(mktemp -d -t fgrepsplit.XXXXXXXXXXXX) || exit
trap 'rm -rf "$t"' EXIT # Remove temp dir when done
trap 'exit 127' HUP INT TERM # Remove temp dir if interrupted, too
split -l 10000 "$1" "$t"/pat
for p in "$t"/pat*; do
grep -F -f "$p" "$2"
done
From what you write, I wonder whether grep
is the right tool for the job. 从你写的内容来看,我想知道grep
是否适合这项工作。 With grep
you would usually try to apply a small set of matching rules, expressed as regular expressions. 使用grep
您通常会尝试应用一小组匹配规则,表示为正则表达式。 In your case, you match against a long list of literals. 在您的情况下,您匹配一长串文字。
This seems to be a case of finding the lines that full_file.txt
and filtered.txt
have in common. 这似乎是找到full_file.txt
和filtered.txt
共有的行的情况。 You might want to look at the following tools to achieve this: 您可能希望查看以下工具来实现此目的:
join
( http://linux.die.net/man/1/join ) gives you the lines that two files have in common. join
( http://linux.die.net/man/1/join )为您提供两个文件共有的行。 Note that both files have to be sorted. 请注意,必须对这两个文件进行排序。 You can use process substitution to achieve this. 您可以使用进程替换来实现此目的。 combine
( http://linux.die.net/man/1/combine ) is a more general utility that does not require the input to be sorted. combine
( http://linux.die.net/man/1/combine )是一个更通用的实用程序,不需要对输入进行排序。 But it may not be available everywhere. 但它可能无处不在。 What about iterate on each line of your file ? 如何迭代文件的每一行? something like : 就像是 :
while IFS= read -r i ; do
grep "$i" full_file.txt
done < grep_filter.txt >filtered.txt
用awk
:
awk -F"[:,]" 'FNR==NR{a[$2]=$0;next} ($0 in a) {print a[$0]}' input.txt filter.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.