简体   繁体   中英

sed is adding unwanted whitespace to end of file, making it invalid

Trying to replace file contents using sed, the replacement works, but for some reason I am getting extra white space at the end of the resulting output file, causing the file to be unreadable/unviewable in the opening application.

My command is as follows:

for file in *.example ; do LANG=C sed -i "" "s|https://foo.bar|http://foo.bar|g" "$file" ; done

Things I have tried without success:

  • Not wrapping the s/[...]/g argument in quotes (causes command to fail)
  • Using delimiters other than |such as / or _ or % (makes no difference)
  • Using single quotes instead of double (makes no difference)
  • Escaping the periods and colons as well (makes no difference)

EDIT: This issue appears to be file-type related, and therefore I am no longer interested in a solution. Thank you to those who've replied.

I suggest to replace

\foo.bar

by

foo.bar

With the benefit of hindsight:

BSD/macOS sed is fundamentally unsuitable for making substitutions in binary files , because it invariably outputs a trailing \\n (newline) with every output command.

By contrast, GNU sed doesn't have this problem , because it - commendably - only appends a \\n if the input "line" had one too.

Note that the concept of newline-separated lines doesn't really apply to binary input: newlines may or may not be present, and potentially with large chunks of data in between. In the worst case scenario, the entire input will be read at once . [1]

You can test this behavior with the following command:

sed -n 'p' <(printf 'x') | cat -et  # input printf 'x' has no trailing \n

Output x$ indicates that a newline (symbolized as $ by cat -et ) was appended (BSD Sed), whereas just x indicates that it was not (GNU Sed).

Thus, given that you're on macOS, you could use Homebrew to install GNU Sed with brew install gnu-sed and then use the following command:

LANG=C gsed -i 's|https://foo.bar|http://foo.bar|g' *.example
  • Homebrew installs GNU Sed as gsed , so that it can exist alongside macOS's stock (BSD) sed .

  • LANG=C (slightly more robustly: LC_ALL=C ) is needed to pass all bytes of the binary input through as-is, without causing problems stemming from binary bytes not being recognized as valid characters ).
    Note that this approach limits you to ASCII-only characters in the substitution (unless you explicitly add byte values as escape sequences).

  • Note the different, incompatible -i syntax for in-place updating without backup - no (separate) option-argument here; see this answer of mine for background.

  • Note how '...' (single-quoting) is used around the Sed script, which is generally preferable, as it avoids confusion between shell expansions that happen up front and what Sed ends up seeing.


[1] Aside from memory use, it is fine to use Sed's default line-parsing behavior here, given that your substitution command doesn't match newlines. If you want to break the input into "lines" by NULs (and also use NULs on output), however, you can use GNU Sed's -z option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM