bash / bin / grep：参数列表太长（使用--file选项）

Question

I have a text file containing 33.869 rows and I have to filter 30.067 of them. 我有一个包含33.869行的文本文件，我必须过滤30.067行。

With an example: 举个例子：

File: input.txt (csv like with 33.869 rows) 文件： input.txt （csv与33.869行一样）

#00001:A123456.10.101.102,first,row,value2,1
#00002:A123456.10.101.103,second,row,value7,85
(omissis)
#33869:A123456.25.170.180,last,test,value9,0

File: filter.txt (list of values separated by "\\n" with 30.067 rows) 文件： filter.txt （由“\\ n”以30.067行分隔的值列表）

A123456.10.101.102
A123456.10.101.103
(omissis)
A123456.24.150.115

(expected) Output file: output.txt (csv like with 30.067 rows taken from input.txt): （预期）输出文件： output.txt （csv与从input.txt获取的30.067行）：

#00001:A123456.10.101.102,first,row,value2,1
#00002:A123456.10.101.103,second,row,value7,85
(omissis)
#30067:A123456.24.150.115,whatever,x,y,99

The command I'm using is: 我正在使用的命令是：

#!/bin/bash
/bin/grep --file="filter.txt" input.txt > output.txt

but error returned is 但是返回的错误是

/bin/grep: Argument list too long

Am I forced to split "filter.txt" in smaller chunk? 我是否被迫在较小的块中拆分“filter.txt”？

Which is the limit allowed? 允许的限制是多少？

I did not find the limit on man code command. 我没有找到man code命令的限制。

Answer 1

If there are no regular expressions in the input file, you should switch to grep -F which can read a significantly larger number of input records. 如果输入文件中没有正则表达式，则应切换到grep -F ，它可以读取大量输入记录。

Failing that, splitting the input file would be hugely more efficient than running 30,000+ iterations of grep over the same file. 如果不这样做，拆分输入文件比在同一个文件上运行30,000多次grep迭代要高效得多。

Here's splitting in chunks of 10,000 lines; 这里分为10,000行; adapting to a different factor should be trivial. 适应不同的因素应该是微不足道的。

#!/bin/sh

t=$(mktemp -d -t fgrepsplit.XXXXXXXXXXXX) || exit
trap 'rm -rf "$t"' EXIT       # Remove temp dir when done
trap 'exit 127' HUP INT TERM  # Remove temp dir if interrupted, too

split -l 10000 "$1" "$t"/pat

for p in "$t"/pat*; do
    grep -F -f "$p" "$2"
done

Answer 2

From what you write, I wonder whether grep is the right tool for the job. 从你写的内容来看，我想知道grep是否适合这项工作。 With grep you would usually try to apply a small set of matching rules, expressed as regular expressions. 使用grep您通常会尝试应用一小组匹配规则，表示为正则表达式。 In your case, you match against a long list of literals. 在您的情况下，您匹配一长串文字。

This seems to be a case of finding the lines that full_file.txt and filtered.txt have in common. 这似乎是找到full_file.txt和filtered.txt共有的行的情况。 You might want to look at the following tools to achieve this: 您可能希望查看以下工具来实现此目的：

join ( http://linux.die.net/man/1/join ) gives you the lines that two files have in common. join （ http://linux.die.net/man/1/join ）为您提供两个文件共有的行。 Note that both files have to be sorted. 请注意，必须对这两个文件进行排序。 You can use process substitution to achieve this. 您可以使用进程替换来实现此目的。
combine ( http://linux.die.net/man/1/combine ) is a more general utility that does not require the input to be sorted. combine （ http://linux.die.net/man/1/combine ）是一个更通用的实用程序，不需要对输入进行排序。 But it may not be available everywhere. 但它可能无处不在。

Answer 3

What about iterate on each line of your file ? 如何迭代文件的每一行？ something like : 就像是：

while IFS= read -r  i ; do
   grep "$i" full_file.txt
done < grep_filter.txt >filtered.txt

Answer 4

用awk ：

awk -F"[:,]" 'FNR==NR{a[$2]=$0;next} ($0 in a) {print a[$0]}'  input.txt filter.txt

bash / bin / grep：参数列表太长（使用--file选项）

问题描述

4 个解决方案

解决方案1
3 已采纳 2016-01-25 11:03:14

解决方案2
2 2016-01-25 10:09:07

解决方案3
1 2016-01-25 10:42:54

解决方案4
1 2016-01-25 11:15:06

bash / bin / grep：参数列表太长（使用--file选项）

问题描述

4 个解决方案

解决方案1 3 已采纳 2016-01-25 11:03:14

解决方案2 2 2016-01-25 10:09:07

解决方案3 1 2016-01-25 10:42:54

解决方案4 1 2016-01-25 11:15:06

解决方案1
3 已采纳 2016-01-25 11:03:14

解决方案2
2 2016-01-25 10:09:07

解决方案3
1 2016-01-25 10:42:54

解决方案4
1 2016-01-25 11:15:06