简体   繁体   English

Linux系统如何处理文件中的CRLF?

[英]How does Linux system handle CRLF in a file?

I know CR LF (\\r\\n) would be interpreted as two characters, "carriage return" + "new line", but how would that affect different programs when it's for example, a source code-- 我知道CR LF(\\ r \\ n)会被解释为两个字符,“回车” +“换行”,但是当例如源代码时,这将如何影响不同的程序-

  1. As a bash script to be executed? 作为要执行的bash脚本?
  2. As a source code to be compiled? 作为要编译的源代码? For example, .c file? 例如,.c文件?

As it is a sequence of whitespace characters, CRLF is ignored in C, but not in Bash: 因为是空格字符序列,所以C会忽略CRLF,而Bash会忽略CRLF:

If the first line of a bash script ( #!/bin/bash ) has a CRLF line terminator, the script won't run. 如果bash脚本的第一行( #!/bin/bash )具有CRLF行终止符,则该脚本将无法运行。 It will be looking for the file /bin/bash\\r , which doesn't exist. 它将寻找文件/bin/bash\\r ,该文件不存在。

If any of the other lines of a script have a CRLF line terminator, the command on that line will either be not found (as bash is looking for a command named some_command\\r ), or will be passed a \\r at the end of its last parameter. 如果脚本的其他任何行具有CRLF行终止符,则将找不到该行上的命令(因为bash正在寻找名为some_command\\r的命令),或者将在末尾传递\\r它的最后一个参数。

Shell script Shell脚本

The shell does not treat CR as white space by default. 默认情况下,shell不会将CR视为空白。

Source code ( crlf67.sh ) with CR marked by ^M : 源代码( crlf67.sh ),CR用^M标记:

#!/bin/sh^M
^M
echo "Hello^M
World!"^M

Running the command explicitly: 显式运行命令:

$ sh crlf67.sh
: command not found
Hello
World!
$ sh crlf67.sh 2>&1 | vis -r
crlf67.sh: line 2: ^M: command not found
Hello^M
World!^M
$

(The vis command is an extended version of the vis program from Brian W Kernighan, Rob Pike The Unix Programming Environment (Nov 1983). It makes non-printing characters visible.) vis命令是Brian W Kernighan的Rob派克Unix编程环境 (1983年11月)中vis程序的扩展版本。它使非打印字符可见。)

If you make the script executable: 如果使脚本可执行:

$  make crlf67
cat crlf67.sh >crlf67 
chmod a+x crlf67
$ crlf67
-bash: ./crlf67: /bin/sh^M: bad interpreter: No such file or directory
$

The kernel doesn't treat the CR as white space either and fails to find the command. 内核也不会将CR视为空白,也无法找到命令。

C source code C源代码

In C source code, officially, you can't use backslash to continue lines in C if the line ending is CRLF because the character after the backslash isn't a newline (NL or LF); 正式地,在C源代码中,如果行末尾是CRLF,则不能使用反斜杠在C中继续行,因为反斜杠后面的字符不是换行符(NL或LF)。 it's a CR. 这是CR。 Some compilers will ignore white space (at least the CR) after the last backslash on a line — GCC 9.1.0 for one, but also earlier versions. 在一行的最后一个反斜杠之后,某些编译器会忽略空格(至少是CR),GCC 9.1.0表示的是一个空格,但也包括较早的版本。 It warns about spaces after a trailing backslash (unless you use -Werror as I do; then it's an error). 它在反斜杠后警告空格(除非您像我一样使用-Werror ;否则将是错误)。 It isn't what the standard stipulates; 这不是标准规定的内容; however, even -pedantic doesn't stop it ignoring the erroneous notation. 但是,即使-pedantic也不会忽略错误的表示法而停止它。

Source code ( crlf19.c ) with CR marked by ^M and newline marked by ^J : 源代码( crlf19.c ),CR用^M标记,换行符^J标记:

#include <stdio.h>^M^J
^M^J
int main(void)^M^J
{^M^J
    printf("Hello\   ^M^J
 world!\   ^M^J
\n");^M^J
    return 0;^M^J
}^M^J

Compilation by GCC 9.1.0 on macOS 10.14.5 Mojave: GCC 9.1.0在macOS 10.14.5 Mojave上进行的编译:

$ gcc -O3 -g -std=c11 -Wall -Wextra -pedantic crlf19.c -o crlf19 
crlf19.c: In function ‘main’:
crlf19.c:5:18: warning: backslash and newline separated by space
    5 |     printf("Hello\
      |                   
crlf19.c:6:8: warning: backslash and newline separated by space
    6 |  world!\
      |         
$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror crlf19.c -o crlf19 
crlf19.c: In function ‘main’:
crlf19.c:5:18: error: backslash and newline separated by space [-Werror]
    5 |     printf("Hello\
      |                   
crlf19.c:6:8: error: backslash and newline separated by space [-Werror]
    6 |  world!\
      |         
cc1: all warnings being treated as errors
$

This behaviour goes back at least as far as GCC 4.1.2 — that version was tested on an ante-diluvian RHEL 5 box. 此行为至少可以追溯到GCC 4.1.2 -该版本已在前Diluvian RHEL 5盒子上进行了测试。

If you remove the spaces after the backslash leaving just the CRLF line endings, GCC doesn't complain at all. 如果您在反斜杠后删除空格,仅留下CRLF行尾,则GCC完全不会抱怨。

It depends on the program that's processing the file. 这取决于正在处理文件的程序。 I don't believe there's any general rule. 我不认为有任何一般规则。

For example, I just created several shell scripts in an otherwise empty directory. 例如,我刚刚在原本为空的目录中创建了几个Shell脚本。 One of them is named some_command with an ASCII CR as the last character of the file name. 其中一个名为some_command ,以ASCII CR作为文件名的最后一个字符。

I can invoke that command from a shell script by including that CR as part of the command name. 我可以通过将CR作为命令名的一部分从Shell脚本中调用该命令。 The shell (sh, bash, or ksh) doesn't treat the CR character as white space. Shell(sh,bash或ksh)不将CR字符视为空格。

$ ls -l
total 16
-rwxr-xr-x 1 kst kst 26 Jul  1 16:46  crlf.bash
-rwxr-xr-x 1 kst kst 25 Jul  1 16:46  crlf.ksh
-rwxr-xr-x 1 kst kst 24 Jul  1 16:46  crlf.sh
-rwxr-xr-x 1 kst kst 21 Jul  1 16:49 'some_command'$'\r'
$ cat -v crlf.bash
#!/bin/bash
some_command^M
$ cat -v crlf.ksh
#!/bin/ksh
some_command^M
$ cat -v crlf.sh
#!/bin/sh
some_command^M
$ cat -v some_command
#!/bin/sh
echo hello
$ ./crlf.bash
Hello
$ ./crlf.ksh
Hello
$ ./crlf.sh
Hello
$

The version of ls I'm using (GNU coreutils 8.28) has a special syntax for showing file names that contain special characters. 我使用的ls版本(GNU coreutils 8.28)具有特殊的语法,用于显示包含特殊字符的文件名。 cat -v shows CR characters as ^M . cat -v CR字符显示为^M

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM