简体   繁体   中英

How to delete end-of-line sign (using sed/awk) of a single-line txt file?

I've been able to feed a php function with a list of URLs (on a Raspberry Pi 3) only if the "list" is a txt file containing a single line (URL) without the ending end-of-line sign ("$"). I've tried

sed -e 's/\r$//g'

and

sed -e 's/^M//g'

but I was only able to delete the ending "$" manually within a text editor going to the last (ie second) line of the file and pressing backspace on the keyboard.

There's no problem splitting the master file containing hundreds of URLs into single-line files and calling php function a file-at-a-time, but there must be another easy way (sed, awk?) to delete the ending "$" at the end of the (only) line in the file.

There is no $ in your file. $ is a symbol used to indicate end-of-string in a regular expression (just like ^ means start-of-string). In a tool that operates one line at a time the end of the string it's working on is also the end of the line so often people using line-oriented tools mis-state $ as meaning end-of-line since in the context of that tool it's the same thing. $ is also used in other tools (eg cat -E ) as an end-of-line indicator.

Some terminology/definitions:

  • \\r is an escape sequence used in scripts to generate or match the CR (carriage-return) character ^M (control-M), ASCII 13
  • \\n is an escape sequence used in scripts to generate or match the LF (line-feed) character ^J (control-J), ASCII 10
  • $ is a regexp meta-character used in scripts to indicate end-of-string (which often is also the end-of-line) and is also used by tools to indicate end-of-line when displaying text.
  • \\n (ie LF alone) is considered a newline in UNIX
  • \\r\\n (ie CRLF ) is considered a newline in DOS (see Why does my tool output overwrite itself and how do I fix it? )

So when you do:

$ printf 'foo\n' | cat -vE
foo$

that does not mean there's a $ at the end of foo , it's just cat displaying a $ to show you where the end of the line is. When you do:

$ printf 'foo\r\n' | cat -vE
foo^M$

the ^M (control-M) is explicitly showing you the CR (carriage-return) character generated by \\r but the $ is not explicitly showing you the ^J (control-J) character that the LF (line-feed) generated by the \\n , instead it's specifically displaying a different character $ to show you the end of the line. If it DID show you ^J s then everything would be concatenated on one line which would be tough to read. Consider the ease of reading this:

$ printf 'the\nquick\nbrown\nfox\n' | cat -vE
the$
quick$
brown$
fox$

vs if the output was this:

$ printf 'the\nquick\nbrown\nfox\n' | some_other_tool
the^Jquick^Jbrown^Jfox^J

You can never do either of these:

$ printf 'foo\nbar\n' | sed 's/$//' | cat -vE
foo$
bar$

$ printf 'foo\nbar\n' | sed 's/\n//' | cat -vE
foo$
bar$

to remove a LF since sed already consumed the LF when reading the input and the $ isn't itself the newline character, it's a metacharacter that lets you say in your regexp "match the end of the line" (in this case since the end of the input string IS the end of the line for sed by default).

You might ask - if sed consumed the LF when reading the input then why are there LFs at the end of each line of output? The answer is that sed adds a LF to every output line so that what it outputs is a valid POSIX text file (without terminating LFs you do not have a POSIX text file and so what any subsequent tool does with it is undefined behavior).

You can remove LFs, though, if you use a tool that does not read one line at a time. GNU sed has a -z option to read NUL-separated text instead of LF-separated text and in that mode you can remove LF characters:

$ printf 'foo\nbar\n' | sed -z 's/\n//' | cat -vE
foobar$

and now you can see how $ (the end-of-string metacharacter) is different from \\n (the escape sequence to match the LF character):

$ printf 'foo\nbar\n' | sed -z 's/$//' | cat -vE
foo$
bar$

$ printf 'foo\nbar\n' | sed -z 's/\n/<LF>/' | cat -vE
foo<LF>bar$

$ printf 'foo\nbar\n' | sed -z 's/$/<EOS>/' | cat -vE
foo$
bar$
<EOS>$

So the quick answer for "how do you remove LFs with sed?" is this with GNU sed:

$ printf 'foo\nbar\n' | sed -z 's/\n//g'
foobar$

and if you don't have GNU sed (or actually even if you do since the above will read the whole input into memory at once assuming a POSIX text file without NULs as input) then you should just use awk:

$ printf 'foo\nbar\n' | awk -v ORS= '1'
foobar$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM