简体   繁体   中英

bash sed (or others) to substitute front anchored double space AND trailing non-printable character

I've lines of the following form

line="  this is a line with 2 leading spaces and a trailing control char^M"

I want to substitute both the 2 leading spaces and the trailing control char represented here by ^M with nothing.

echo "${line}" | sed 's/^[[:space:]]*//' | tr -dc '[:print:]'
echo "${line}" | sed 's/^[[:space:]]*//' | sed 's/[^[:print:]]//'

both works. I also tried with

echo "${line}" | sed 's/^[[:space:]]*|[^[:print:]]//'

but this doesn't work.

Why doesn't this last expression work?
How can I accomplish this with a single call to sed and a single regex?
What is the preferred solution, for example in terms of efficiency? Is it better to avoid many subshells?
Are there better solutions?

sed 's/^[[:space:]]*|[^[:print]]//'

doesn't work because | matches itself literally. "Or" is spelled \\| in sed. (And [:print] should be [:print:] ).

But that's still not enough because by default sed only replaces the first occurrence; you need the /g flag to replace all occurrences:

sed 's/^[[:space:]]*\|[^[:print:]]//g'

But your original regex may have some unintended consequences: [[:space:]] matches newlines, so if the input is one or more complete lines, it will remove all blank lines, not just their contents. To prevent this, use [[:blank:]] instead:

sed 's/^[[:blank:]]*\|[^[:print:]]//g'

This single sed should work:

sed 's/^[[:blank:]]*//; s/[[:cntrl:]]*$//' <<< "$line"
this is a line with 2 leading spaces and a trailing control char

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM