[英]Extract large list of lines from large text file
I need to extract ~5000 lines from a file with ~300,000 lines on bash (OSX). 我需要从bash(OSX)上具有约300,000行的文件中提取约5000行。 Running
运行
sed '128082p;128083p;...(4996 numbers)....;159845q;d' file > output
gives the error 给出错误
sed: 1: "128082p;128083p;128084p ...": command expected
This same command works if I try to extract 10 lines only. 如果我尝试仅提取10行,则此命令也有效。 Whereas running
而跑步
for i in `cat line_file`; do sed -n "$ip" file; done >> output
creates a file that's more than ~5000 lines long. 创建一个长度超过5000行的文件。 What's the right command in either case?
两种情况下正确的命令是什么?
Edit: this is not a range of numbers. 编辑:这不是数字范围。
Tip of the hat to Jonathan Leffler for his help. 向乔纳森·莱夫勒 ( Jonathan Leffler)的帮助表示感谢。
It looks like BSD sed
as used on macOS (as of macOS 10.12.1) has a hard limit on the size of each line of a script that can be passed to it: 2048
bytes . 看来macOS上使用的BSD
sed
(自macOS 10.12.1起) 对可以传递给脚本的每一行的大小都有硬性限制 : 2048
个字节 。
When passed as a command-line argument (implicitly as the first operand, or explicitly via -e
options), scripts are typically passed as a single line, as you did. 当作为命令行参数 (隐式地第一个操作数,或者通过明确传递
-e
选项),脚本通常通过为单行 ,像你一样。
If that single line gets too long, it is regrettably blindly cut off, typically resulting in a seemingly random syntax error, like the one you saw. 如果单行太长,很遗憾会被盲目地切断,通常会导致看似随机的语法错误,就像您看到的那样。
There are two workarounds : 有两种解决方法 :
Make sure that your script contains only short-enough lines by separating commands with \\n
(newlines) instead of ;
通过用
\\n
(换行符)而不是;
分隔命令,确保脚本仅包含足够短的行;
and/or split your script across multiple -e
options (which is cumbersome). 和/或将脚本拆分为多个
-e
选项(这很麻烦)。
Provide the entire script via a file , using the -f
option, in which case all commands must be separated with \\n
rather than ;
使用
-f
选项通过文件提供整个脚本,在这种情况下,所有命令都必须用\\n
而不是;
分隔;
anyway. 无论如何。
In the unlikely event that your script is too long to fit on a single command line (a limit imposed by the system - see bottom), using -f
is your only option. 万一您的脚本太长而无法容纳在单个命令行中( 系统强加了一个限制,请参阅底部),使用
-f
是唯一的选择。
Here's an example of a command-line script that is too long: 这是一个太长的命令行脚本示例:
$ sed -n "$(printf '%sp;' {1..432})" <<<'line 1'
sed: 1: "1p;2p;3p;4p;5p;6p;7p;8p ...": command expected # !! ERROR
Even though the script is syntactically correct, cutting its one and only line off at 2048 bytes leaves it incorrect, resulting in the seemingly random command expected
error. 即使该脚本在语法上是正确的,但仅以2048字节的形式截断其一行就不会正确,从而导致看似随机的
command expected
错误。
In this case, working around the limitation is simple: by replacing ;
在这种情况下,解决限制很简单:通过替换
;
with \\n
, the individual lines become short enough: 使用
\\n
,各行变得足够短:
$ sed -n "$(printf '%sp\n' {1..432})" <<<'line 1'
line 1 # OK
Since you already have a file of line numbers - line_file
- you can use an auxiliary sed
command to create your \\n
-separated script from it: 由于您已经有一个行号文件
line_file
您可以使用辅助 sed
命令从中创建\\n
分隔的脚本:
$ sed -n "$(sed 's/$/p/' line_file)" file > output
Here's how to solve the problem via a script file passed via -f
, in which the commands are \\n
-separated fixes the problem: 这是通过
-f
传递的脚本文件解决问题的方法,在脚本文件中,命令是\\n
分隔可解决问题:
$ printf '%sp\n' {1..432} > script.sed # Create script file with \n-separated commands.
$ sed -n -f "script.sed" <<<'line 1' # Pass script file via -f
line 1 # OK
Note: Using a process substitution ( sed -n -f <(printf ...) ...
) as an ad-hoc script file inexplicably does not work. 注意:使用进程替换(
sed -n -f <(printf ...) ...
)作为一个特设的脚本文件莫名不起作用 。
Also note that the overall max. 另请注意, 整体最高 length of a command line for invoking an external utility such as
sed
on macOS (as of 10.12) is 262144
(256 KB; determined with getconf ARG_MAX
), and in practice the limit is lower, because the size of the environment-variable block plays a role. 调用外部实用程序(如macOS上的
sed
的命令行长度(截至10.12)为262144
(256 KB;由getconf ARG_MAX
确定),实际上该限制较低,因为环境变量块的大小一名角色。
If you were to hit that limit, however, you'd get a more helpful error message: Argument list too long
. 但是,如果要达到该限制,则会收到一条更有用的错误消息:
Argument list too long
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.