简体   繁体   English

如何删除单行txt文件的行尾符号(使用sed/awk)?

[英]How to delete end-of-line sign (using sed/awk) of a single-line txt file?

I've been able to feed a php function with a list of URLs (on a Raspberry Pi 3) only if the "list" is a txt file containing a single line (URL) without the ending end-of-line sign ("$").仅当“列表”是包含单行(URL)而没有结束行尾符号(“ $”)。 I've tried我试过了

sed -e 's/\r$//g'

and

sed -e 's/^M//g'

but I was only able to delete the ending "$" manually within a text editor going to the last (ie second) line of the file and pressing backspace on the keyboard.但我只能在文本编辑器中手动删除结尾的“$”,转到文件的最后一行(即第二行)并按键盘上的退格键。

There's no problem splitting the master file containing hundreds of URLs into single-line files and calling php function a file-at-a-time, but there must be another easy way (sed, awk?) to delete the ending "$" at the end of the (only) line in the file.将包含数百个 URL 的主文件拆分为单行文件并一次调用 php 函数是没有问题的,但是必须有另一种简单的方法(sed,awk?)来删除结尾的“$”文件中(仅)行的结尾。

There is no $ in your file.您的文件中没有$ $ is a symbol used to indicate end-of-string in a regular expression (just like ^ means start-of-string). $是一个用于在正则表达式中表示字符串结束的符号(就像^表示字符串开始)。 In a tool that operates one line at a time the end of the string it's working on is also the end of the line so often people using line-oriented tools mis-state $ as meaning end-of-line since in the context of that tool it's the same thing.在一次操作一行的工具中,它正在处理的字符串的结尾也是行的结尾,因此使用面向行的工具的人经常将$误认为是行尾,因为在上下文中工具是一样的。 $ is also used in other tools (eg cat -E ) as an end-of-line indicator. $也用于其他工具(例如cat -E )作为行尾指示符。

Some terminology/definitions:一些术语/定义:

  • \\r is an escape sequence used in scripts to generate or match the CR (carriage-return) character ^M (control-M), ASCII 13 \\r是脚本中用于生成或匹配CR (回车)字符^M (控制-M)、ASCII 13 的转义序列
  • \\n is an escape sequence used in scripts to generate or match the LF (line-feed) character ^J (control-J), ASCII 10 \\n是脚本中用于生成或匹配LF (换行符)字符^J (control-J)、ASCII 10 的转义序列
  • $ is a regexp meta-character used in scripts to indicate end-of-string (which often is also the end-of-line) and is also used by tools to indicate end-of-line when displaying text. $是脚本中用于指示end-of-string (通常也是行结束)的正则表达式元字符,也被工具用于在显示文本时指示行end-of-line
  • \\n (ie LF alone) is considered a newline in UNIX \\n (即单独的LF )在 UNIX 中被认为是换行符
  • \\r\\n (ie CRLF ) is considered a newline in DOS (see Why does my tool output overwrite itself and how do I fix it? ) \\r\\n (即CRLF )被认为是 DOS 中的换行符(请参阅为什么我的工具输出会覆盖自身以及如何修复它?

So when you do:所以当你这样做时:

$ printf 'foo\n' | cat -vE
foo$

that does not mean there's a $ at the end of foo , it's just cat displaying a $ to show you where the end of the line is.这并不意味着在foo的末尾有一个$ ,它只是cat显示一个$来告诉你行尾的位置。 When you do:当你这样做时:

$ printf 'foo\r\n' | cat -vE
foo^M$

the ^M (control-M) is explicitly showing you the CR (carriage-return) character generated by \\r but the $ is not explicitly showing you the ^J (control-J) character that the LF (line-feed) generated by the \\n , instead it's specifically displaying a different character $ to show you the end of the line. ^M (control-M)明确地向您展示了由\\r生成的CR (回车)字符,但$没有明确地向您展示LF (换行)生成的^J (control-J)字符由\\n代替,它专门显示不同的字符$以显示行尾。 If it DID show you ^J s then everything would be concatenated on one line which would be tough to read.如果它确实向您显示^J s,那么所有内容都将连接在一行上,这将很难阅读。 Consider the ease of reading this:考虑一下阅读这个的难易程度:

$ printf 'the\nquick\nbrown\nfox\n' | cat -vE
the$
quick$
brown$
fox$

vs if the output was this:与如果输出是这样的:

$ printf 'the\nquick\nbrown\nfox\n' | some_other_tool
the^Jquick^Jbrown^Jfox^J

You can never do either of these:您永远无法执行以下任一操作:

$ printf 'foo\nbar\n' | sed 's/$//' | cat -vE
foo$
bar$

$ printf 'foo\nbar\n' | sed 's/\n//' | cat -vE
foo$
bar$

to remove a LF since sed already consumed the LF when reading the input and the $ isn't itself the newline character, it's a metacharacter that lets you say in your regexp "match the end of the line" (in this case since the end of the input string IS the end of the line for sed by default).要删除 LF,因为 sed 在读取输入时已经消耗了 LF 并且$本身不是换行符,它是一个元字符,可让您在正则表达式中说“匹配行尾”(在这种情况下,因为结束默认情况下,输入字符串的末尾是 sed 的行尾)。

You might ask - if sed consumed the LF when reading the input then why are there LFs at the end of each line of output?您可能会问 - 如果 sed 在读取输入时消耗了 LF 那么为什么在每行输出的末尾都有 LF? The answer is that sed adds a LF to every output line so that what it outputs is a valid POSIX text file (without terminating LFs you do not have a POSIX text file and so what any subsequent tool does with it is undefined behavior).答案是 sed 向每个输出行添加一个 LF,因此它输出的是一个有效的 POSIX 文本文件(如果不终止 LF,您就没有 POSIX 文本文件,因此任何后续工具对它所做的都是未定义的行为)。

You can remove LFs, though, if you use a tool that does not read one line at a time.但是,如果您使用的工具一次不读取一行,则可以删除 LF。 GNU sed has a -z option to read NUL-separated text instead of LF-separated text and in that mode you can remove LF characters: GNU sed 有一个-z选项来读取 NUL 分隔的文本而不是 LF 分隔的文本,在这种模式下你可以删除LF字符:

$ printf 'foo\nbar\n' | sed -z 's/\n//' | cat -vE
foobar$

and now you can see how $ (the end-of-string metacharacter) is different from \\n (the escape sequence to match the LF character):现在您可以看到$ (字符串结尾元字符)与\\n (匹配 LF 字符的转义序列)有何不同:

$ printf 'foo\nbar\n' | sed -z 's/$//' | cat -vE
foo$
bar$

$ printf 'foo\nbar\n' | sed -z 's/\n/<LF>/' | cat -vE
foo<LF>bar$

$ printf 'foo\nbar\n' | sed -z 's/$/<EOS>/' | cat -vE
foo$
bar$
<EOS>$

So the quick answer for "how do you remove LFs with sed?"所以“你如何用 sed 删除 LF”的快速答案? is this with GNU sed:这是 GNU sed 吗:

$ printf 'foo\nbar\n' | sed -z 's/\n//g'
foobar$

and if you don't have GNU sed (or actually even if you do since the above will read the whole input into memory at once assuming a POSIX text file without NULs as input) then you should just use awk:并且如果您没有 GNU sed(或者实际上即使您这样做,因为假设没有 NUL 作为输入的 POSIX 文本文件,上述内容会立即将整个输入读入内存),那么您应该只使用 awk:

$ printf 'foo\nbar\n' | awk -v ORS= '1'
foobar$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM