简体   繁体   中英

sort -o appends newline to end of file - why?

I'm working on a small text file with a list of words in it that I want to add a new word to, and then sort. The file doesn't have a newline at the end when I start, but does after the sort. Why? Can I avoid this behavior or is there a way to strip the newline back out?

Example:

words.txt looks like

apple
cookie
salmon

I then run printf "\\norange" >> words.txt; sort words.txt -o words.txt printf "\\norange" >> words.txt; sort words.txt -o words.txt

I use printf rather than echo figuring that'll avoid the newline, but the file then reads

apple
cookie
orange
salmon
#newline here

If I just run printf "\\norange" >> words.txt orange appears at the bottom of the file, with no newline, ie;

apple
cookie
salmon
orange

This behavior is explicitly defined in the POSIX specification for sort :

The input files shall be text files, except that the sort utility shall add a newline to the end of a file ending with an incomplete last line.

As a UNIX "text file" is only valid if all lines end in newlines, as also defined in the POSIX standard :

Text file - A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the newline character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

Think about what you are asking sort to do.

You are asking it "take all the lines, and sort them in order."

You've given it a file containing four lines, which it splits to the following strings:

"salmon\n"
"cookie\n"
"orange"

It sorts these for you dutifully:

"cookie\n"
"orange"
"salmon\n"

And it then outputs them as a single string:

"cookie
orangesalmon
"

That is almost certainly exactly what you do not want.

So instead, if your file is missing the terminating newline that it should have had , the sort program understands that, most likely, you still intended that last line to be a line, rather than just a fragment of a line. It appends a \\n to the string "orange", making it "orange\\n". Then it can be sorted properly, without "orange" getting concatenated with whatever line happens to come immediately after it:

"cookie\n"
"orange\n"
"salmon\n"

So when it then outputs them as a single string, it looks a lot better:

"cookie
orange
salmon
"

You could strip the last character off the file, the one from the end of "salmon\\n", using a range of handy tools such as awk , sed , perl , php , or even raw bash . This is covered elsewhere, in places like:

How can I remove the last character of a file in unix?

But please don't do that. You'll just cause problems for all other utilities that have to handle your files, like sort. And if you assume that there is no terminating newline in your files, then you will make your code brittle: any part of the toolchain which "fixes" your error (as sort kinda does here) will "break" your code.

Instead, treat text files the way they are meant to be treated in unix: a sequence of "lines" (strings of zero or more non-newline bytes), each followed by a newline.

So newlines are line-terminators, not line-separators.

There is a coding style where print s and echo s are done with the newline leading. This is wrong for many reasons, including creating malformed text files, and causing the output of the program to be concatenated with the command prompt. printf "orange\\n" is correct style, and also more readable: at a glance someone maintaining your code can tell you're printing the word "orange" and a newline, whereas printf "\\norange" looks at first glance like it's printing a backslash and the phrase "no range" with a missing space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM