简体   繁体   中英

Remove trailing blank lines in file in Bash by truncating

Is there a way to remove any trailing blank lines (lines that contain only whitespace) that are at the end of file using Bash?

For example, this:

123\n\n\n12\n  \n \t \n

Should become:

123\n\n\n12\n

I know how to do that in C, using fseek() and ftruncate(), but not sure if it's possible using bash and off-the-shelf cmd-line utilities, without creating a specialized C program for it.

I have seen some question asking about removing trailing whitespace in general, such as How to remove trailing whitespace of all files recursively? , but I'm asking about doing it by truncating instead of overwriting the file (for performance reasons).

You can find trailing blank lines with tac and then truncate with dd :

#!/bin/bash
file=$1
trailing=$(tac "$file" | sed -n '/^[ \t]*$/!q; p' | wc -c)
end=$(( $(wc -c < "$file") - trailing ))
dd bs=1 seek=$end count=0 of="$file"

I like @that other guy 's answer very much.

But here's one other possibility that uses the fact that command substitutions remove trailing newlines, and doesn't read the file twice to compute the position where it should be trimmed.

#!/bin/bash

file=$1
tokeep=$(wc -c <<< "$(< "$file")") || exit $?
dd if=/dev/null of="$file" bs=1 seek=$tokeep

If you want to remove trailing spaces (ie, newlines, spaces, tabs, etc.), use tr to replace whitespaces with newlines, so that the trailing ones will be discarded:

#!/bin/bash

file=$1
tokeep=$(wc -c <<< "$(tr '[[:space:]]' '\n' < "$file")") || exit $?
dd if=/dev/null of="$file" bs=1 seek=$tokeep

This preserves a single trailing newline (because the here-string <<< adds a newline). If you want to trim this trailing newline (but really, you shouldn't!), replace seek=$tokeep by seek=$((tokeep-1)) in the dd statement.

Note. The [[:space:]] character class is locale dependent. In the C and POSIX locale it corresponds to space, form-feed \\f , newline \\n , carriage return \\r , horizontal tab \\t and vertical tab \\v (see man 3 isspace ) 1 . You can craft your own set of characters too: if you only want to trim trailing newlines and tabs but preserves all the other spaces, use

tr '\t' '\n'

1 this is good since they all are one byte long, but don't use if your locale has spaces that are longer than one byte (eg, an unbreakable space U+00A0 is UTF-8 encoded as two bytes C2 A0 ). If unsure what locale is in use, you should use your own characters in tr , eg, '\\t ' , just to be sure they all are one byte long. If you also want to deal with two bytes characters, you should replace them with two newlines, using eg, sed . Example with unbreakable space:

sed 's/'$'\ua0''/\n\n/g'

assuming you have a UTF-8 locale. This is a bit clunky and maybe beyond the scope of your original question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM