简体   繁体   中英

How does Linux system handle CRLF in a file?

I know CR LF (\\r\\n) would be interpreted as two characters, "carriage return" + "new line", but how would that affect different programs when it's for example, a source code--

  1. As a bash script to be executed?
  2. As a source code to be compiled? For example, .c file?

As it is a sequence of whitespace characters, CRLF is ignored in C, but not in Bash:

If the first line of a bash script ( #!/bin/bash ) has a CRLF line terminator, the script won't run. It will be looking for the file /bin/bash\\r , which doesn't exist.

If any of the other lines of a script have a CRLF line terminator, the command on that line will either be not found (as bash is looking for a command named some_command\\r ), or will be passed a \\r at the end of its last parameter.

Shell script

The shell does not treat CR as white space by default.

Source code ( crlf67.sh ) with CR marked by ^M :

#!/bin/sh^M
^M
echo "Hello^M
World!"^M

Running the command explicitly:

$ sh crlf67.sh
: command not found
Hello
World!
$ sh crlf67.sh 2>&1 | vis -r
crlf67.sh: line 2: ^M: command not found
Hello^M
World!^M
$

(The vis command is an extended version of the vis program from Brian W Kernighan, Rob Pike The Unix Programming Environment (Nov 1983). It makes non-printing characters visible.)

If you make the script executable:

$  make crlf67
cat crlf67.sh >crlf67 
chmod a+x crlf67
$ crlf67
-bash: ./crlf67: /bin/sh^M: bad interpreter: No such file or directory
$

The kernel doesn't treat the CR as white space either and fails to find the command.

C source code

In C source code, officially, you can't use backslash to continue lines in C if the line ending is CRLF because the character after the backslash isn't a newline (NL or LF); it's a CR. Some compilers will ignore white space (at least the CR) after the last backslash on a line — GCC 9.1.0 for one, but also earlier versions. It warns about spaces after a trailing backslash (unless you use -Werror as I do; then it's an error). It isn't what the standard stipulates; however, even -pedantic doesn't stop it ignoring the erroneous notation.

Source code ( crlf19.c ) with CR marked by ^M and newline marked by ^J :

#include <stdio.h>^M^J
^M^J
int main(void)^M^J
{^M^J
    printf("Hello\   ^M^J
 world!\   ^M^J
\n");^M^J
    return 0;^M^J
}^M^J

Compilation by GCC 9.1.0 on macOS 10.14.5 Mojave:

$ gcc -O3 -g -std=c11 -Wall -Wextra -pedantic crlf19.c -o crlf19 
crlf19.c: In function ‘main’:
crlf19.c:5:18: warning: backslash and newline separated by space
    5 |     printf("Hello\
      |                   
crlf19.c:6:8: warning: backslash and newline separated by space
    6 |  world!\
      |         
$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror crlf19.c -o crlf19 
crlf19.c: In function ‘main’:
crlf19.c:5:18: error: backslash and newline separated by space [-Werror]
    5 |     printf("Hello\
      |                   
crlf19.c:6:8: error: backslash and newline separated by space [-Werror]
    6 |  world!\
      |         
cc1: all warnings being treated as errors
$

This behaviour goes back at least as far as GCC 4.1.2 — that version was tested on an ante-diluvian RHEL 5 box.

If you remove the spaces after the backslash leaving just the CRLF line endings, GCC doesn't complain at all.

It depends on the program that's processing the file. I don't believe there's any general rule.

For example, I just created several shell scripts in an otherwise empty directory. One of them is named some_command with an ASCII CR as the last character of the file name.

I can invoke that command from a shell script by including that CR as part of the command name. The shell (sh, bash, or ksh) doesn't treat the CR character as white space.

$ ls -l
total 16
-rwxr-xr-x 1 kst kst 26 Jul  1 16:46  crlf.bash
-rwxr-xr-x 1 kst kst 25 Jul  1 16:46  crlf.ksh
-rwxr-xr-x 1 kst kst 24 Jul  1 16:46  crlf.sh
-rwxr-xr-x 1 kst kst 21 Jul  1 16:49 'some_command'$'\r'
$ cat -v crlf.bash
#!/bin/bash
some_command^M
$ cat -v crlf.ksh
#!/bin/ksh
some_command^M
$ cat -v crlf.sh
#!/bin/sh
some_command^M
$ cat -v some_command
#!/bin/sh
echo hello
$ ./crlf.bash
Hello
$ ./crlf.ksh
Hello
$ ./crlf.sh
Hello
$

The version of ls I'm using (GNU coreutils 8.28) has a special syntax for showing file names that contain special characters. cat -v shows CR characters as ^M .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM