简体   繁体   中英

Perl & Sed string substitute in several expressions

I would like to make string substitution in a non-greedy match fashion

  • Remove all leading and trailing dashes, apostrophes (when these symbols are found in the middle of the word, they must be preserved)

  • Transform multiple spaces into 1 space

Example:

--ONE   Tw'o--   -333-   -'FO-UR'

must become

ONE Tw'o 333 FO-UR

I cannot get exactly the result. Can you please help me to correct my perl and sed syntax below?

$ echo "--ONE   Tw'o--   -333-   -'FO-UR'" \
  | perl -pe "s/[-']+(.+?)/\1/g"           \
  | perl -pe "s/(.+?)[-']+/\1/g"           \
  | perl -pe "s/\s+/ /g"

Result (perl): "ONE Two 333 FOUR"

$ echo "--ONE   Tw'o--   -333-   -'FO-UR'" \
  | sed -r -e "s/[-']+(.+?)/\1/g"          \
    -e "s/(.+)[-']+/\1/g"                  \
    -e "s/\s+/ /g"

Result (sed): "ONE Tw'o-- -333- -'FO-UR"

Here's the perl version:

echo "--ONE   Tw'o--   -333-   -'FO-UR'" | perl -ne "s|-'||g; s|'-||g; s|^'||; s|'$||; s|^-+||; s|-+$||; s|-+\s+| |g; s|\s+-+| |g; s|\s+| |g; s|\s+$||; print;"

ONE Tw'o 333 FO-UR

The sed version is basically identical:

echo "--ONE   Tw'o--   -333-   -'FO-UR'" | sed -r -e "s|-'||g; s|'-||g; s|^'||; s|'$||; s|^-+||; s|-+$||; s|-+\s+| |g; s|\s+-+| |g; s|\s+| |g; s|\s+$||;"

ONE Tw'o 333 FO-UR

Annotations for the regular expressions used:

s|-'||g;     # Remove dash followed by quote everywhere
s|'-||g;     # Remove quote followed by dash everywhere
s|^'||;      # Remove leading quote
s|'$||;      # Remove trailing quote
s|^-+||;     # Remove leading dash characters
s|-+$||;     # Remove trailing dash characters
s|-+\s+| |g; # Replace dash characters followed by whitespace with 1 space everywhere
s|\s+-+| |g; # Replace whitespace followed by dash characters with 1 space everywhere
s|\s+| |g;   # Replace multiple spaces with 1 space
s|\s+$||;    # Remove trailing spaces

It is easy using lookarounds in perl :

s='"asd,f",,,"as,df","asdf"asdf"'
perl -pe 's/(?<!\w)-|-(?!\w)//g' <<< "$s"
ONE Tw'o 333 'FO-UR'

(?<!\w)- # Lookbehind meaning match - if not preceded by a word character
|        # regex alternation
(?!\w)-  # Lookahead meaning match - if not followed by a word character

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM