Replace a specific character at any word's begin and end in bash

Question

I need to remove the hyphen '-' character only when it matches the pattern 'space-[AZ]' or '[AZ]-space'. (Assuming all letters are uppercase, and space could be a space, or newline)

sample.txt

I AM EMPTY-HANDED AND I- WA-
-ANT SOME COO- COOKIES

I want the output to be

I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

I've looked around for answers using sed and awk and perl, but I could only find answers relating to removing all characters between two patterns or specific strings, but not a specific character between [AZ] and space.

Thanks heaps!!

Answer 1

If perl is your option, would you try the following:

perl -pe 's/(^|(?<=\s))-(?=[A-Z])//g; s/(?<=[A-Z])-((?=\s)|$)//g' sample.txt

(?<=\\s) is a zero-width lookbehind assertion which matches leading whitespace without including it in the matched substring.
(?=[AZ]) is a zero-width lookahead assertion which matches trailing character between A and Z without including it in the matched substring.
As a result, only the dash characters which match the pattern above are removed from the original text.
The second statement s/..//g is the flipped version of the first one.

Answer 2

Could you please try following.

awk '{for(i=1;i<=NF;i++){if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){sub(/-/,"",$i)}}} 1' Input_file

Adding a non-one liner form of solution:

awk '
{
  for(i=1;i<=NF;i++){
    if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){
      sub(/-/,"",$i)
    }
  }
}
1
'  Input_file

Output will be as follows.

I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

Answer 3

If you can provide Extended Regular Expressions to sed (generally with the -E or -r option), then you can shorten your sed expression to:

sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file

Where the basic form is sed -E 's/find1/replace1/g;s/find2/replace2/g' file which can also be written as separate expressions sed -E -e 's/find1/replace1/g' -e 's/find2/replace2/g' (your choice).

The details of s/find1/replace1/g are:

find1 is
- (^|\\s) locate and capture at the beginning or whitespace,
- followed by the '-' hyphen,
- then capture the next \\w (word-character); and
replace1 is simply \\1\\2 reinsert both captures with the first two backreferences.

The next substitution expression is similar, except now you are looking for the hyphen followed by a whitespace or at the end. So you have:

find2 being
- a capture of \\w (word-character),
- followed by the hyphen,
- followed by a capture of either a following space or the end (\\s|$) , then
replace2 is the same as before, just reinsert the captured characters using backreferences.

In each case the g indicates a global replace of all occurrences.

( note: the \\w word-character also includes the '_' (underscore), so while unlikely you would have a hyphen and underscore together, if you do, you need to use the [A-Za-z] list instead of \\w )

Example Use/Output

In your case, then output is:

$ sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

Answer 4

remove the hyphen '-' character only when it matches the pattern 'space-[AZ]' or '[AZ]-space'. Assuming all letters are uppercase, and space could be a space, or newline

It's:

sed 's/\( \|^\)-\([A-Z]\)/\1\2/g; s/\([A-Z]\)-\( \|$\)/\1\2/g'

s - substitute
- /
- \$ \\|^\$ - space or beginning of the line
- - - hyphen...
- \$AZ]\$ - a single upper case character
- /
- \\1\\2 - The \\1 is replaced by the first \$...\$ thing. So it is replaced by a space or nothing. \\2 is replaced by the single upper case character found. Effectively - is removed.
- /
- g apply the regex globally
; - separate two s commands
s
- Same as above. The $ means end of the line.

Answer 5

awk '{sub(/ -/,"");sub(/^-|-$/,"");sub(/- /," ")}1' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

Replace a specific character at any word's begin and end in bash

Question

5 answers

solution1
3 2020-01-24 04:19:56

solution2
2 2020-01-24 01:30:53

solution3
2 2020-01-24 04:57:06

solution4
1 ACCPTED 2020-01-24 01:36:05

solution5
0 2020-01-25 00:19:22

Replace a specific character at any word's begin and end in bash

Question

5 answers

solution1 3 2020-01-24 04:19:56

solution2 2 2020-01-24 01:30:53

solution3 2 2020-01-24 04:57:06

solution4 1 ACCPTED 2020-01-24 01:36:05

solution5 0 2020-01-25 00:19:22

solution1
3 2020-01-24 04:19:56

solution2
2 2020-01-24 01:30:53

solution3
2 2020-01-24 04:57:06

solution4
1 ACCPTED 2020-01-24 01:36:05

solution5
0 2020-01-25 00:19:22