[英]sed is adding unwanted whitespace to end of file, making it invalid
Trying to replace file contents using sed, the replacement works, but for some reason I am getting extra white space at the end of the resulting output file, causing the file to be unreadable/unviewable in the opening application.尝试使用 sed 替换文件内容,替换有效,但由于某种原因,我在生成的输出文件末尾获得了额外的空白,导致文件在打开的应用程序中不可读/不可见。
My command is as follows:我的命令如下:
for file in *.example ; do LANG=C sed -i "" "s|https://foo.bar|http://foo.bar|g" "$file" ; done
I suggest to replace我建议更换
\foo.bar
by经过
foo.bar
With the benefit of hindsight:事后诸葛亮:
BSD/macOS sed
is fundamentally unsuitable for making substitutions in binary files , because it invariably outputs a trailing \\n
(newline) with every output command. BSD/macOS
sed
从根本上不适合在二进制文件中进行替换,因为它总是在每个输出命令中输出一个尾随\\n
(换行符) 。
By contrast, GNU sed
doesn't have this problem , because it - commendably - only appends a \\n
if the input "line" had one too.相比之下, GNU
sed
没有这个问题,因为它 - 值得称赞的是 - 如果输入“行”也有一个\\n
,它只会附加一个\\n
。
Note that the concept of newline-separated lines doesn't really apply to binary input: newlines may or may not be present, and potentially with large chunks of data in between.请注意,换行符分隔的行的概念并不真正适用于二进制输入:换行符可能存在也可能不存在,并且中间可能有大块数据。 In the worst case scenario, the entire input will be read at once .
在最坏的情况下,将一次读取整个输入。 [1]
[1]
You can test this behavior with the following command:您可以使用以下命令测试此行为:
sed -n 'p' <(printf 'x') | cat -et # input printf 'x' has no trailing \n
Output x$
indicates that a newline (symbolized as $
by cat -et
) was appended (BSD Sed), whereas just x
indicates that it was not (GNU Sed).输出
x$
表示附加了一个换行符(用cat -et
符号表示为$
)(BSD Sed),而仅x
表示它不是(GNU Sed)。
Thus, given that you're on macOS, you could use Homebrew to install GNU Sed with brew install gnu-sed
and then use the following command:因此,假设您使用的是 macOS,您可以使用Homebrew使用
brew install gnu-sed
安装 GNU Sed,然后使用以下命令:
LANG=C gsed -i 's|https://foo.bar|http://foo.bar|g' *.example
Homebrew installs GNU Sed as gsed
, so that it can exist alongside macOS's stock (BSD) sed
. Homebrew 将 GNU Sed 安装为
gsed
,以便它可以与macOS 的股票 (BSD) sed
一起存在。
LANG=C
(slightly more robustly: LC_ALL=C
) is needed to pass all bytes of the binary input through as-is, without causing problems stemming from binary bytes not being recognized as valid characters ). LANG=C
(稍微更健壮: LC_ALL=C
)需要按原样传递二进制输入的所有字节,而不会导致因二进制字节未被识别为有效字符而引起的问题)。
Note that this approach limits you to ASCII-only characters in the substitution (unless you explicitly add byte values as escape sequences).请注意,此方法将您限制在替换中只能使用 ASCII 字符(除非您明确添加字节值作为转义序列)。
Note the different, incompatible -i
syntax for in-place updating without backup - no (separate) option-argument here;请注意不同的、不兼容的
-i
语法,用于在没有备份的情况下进行就地更新 - 此处没有(单独的)选项参数; see this answer of mine for background.请参阅我的这个答案以了解背景。
Note how '...'
(single-quoting) is used around the Sed script, which is generally preferable, as it avoids confusion between shell expansions that happen up front and what Sed ends up seeing.请注意如何在 Sed 脚本周围使用
'...'
(单引号),这通常更可取,因为它避免了前面发生的 shell 扩展与 Sed 最终看到的内容之间的混淆。
[1] Aside from memory use, it is fine to use Sed's default line-parsing behavior here, given that your substitution command doesn't match newlines. [1] 除了内存使用之外,在这里使用 Sed 的默认行解析行为也很好,因为您的替换命令与换行符不匹配。 If you want to break the input into "lines" by NULs (and also use NULs on output), however, you can use GNU Sed's
-z
option.但是,如果您想通过 NUL 将输入分成“行”(并在输出中使用 NUL),则可以使用 GNU Sed 的
-z
选项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.