How to remove alternating whitespaces from file content

Question

I am grepping on a file which occasionally has words that have alternating whitespaces in them.

For instance: hello this is an example

I would like this to become: hello this is an example

I am open for any command line tools to solve this problem. I would take the risk of single character words getting squashed (since they occur very seldomly in my files).

E. g.: hello this is a r isk I would take. becoming hello this is ariskI would take.

Answer 1

Something like this would work:

(?:(?<=^)|(?<= ))([^ ]) (?=[^ ] )

https://regex101.com/r/yLccGg/1

Answer 2

An example using node.js:

$ node -e "fs=require('fs'),fn='input.txt';fs.writeFileSync(fn,fs.readFileSync(fn,{encoding:'utf8'}).replace(/(?<=\b[a-z]) (?![A-Z]|\w\w\w)/g, ''));"

A space is replaced if it follows a lower-case letter and is not followed by a capital letter or three consecutive word characters.

Answer 3

Using GNU awk for patsplit() and gensub():

awk '{
    numFlds = patsplit($0,flds,/\<([^ ] )+[^ ]\>/,seps)
    out = seps[0]
    for ( i=1; i<=numFlds; i++ ) {
        out = out gensub(/([^ ]) /,"\\1","g",flds[i]) seps[i]
    }
    print out
}' file
hello this is an examp le

alternatively, still using GNU awk but now for the 3rd arg to match() and gensub():

$ awk '{
    while ( match($0,/\<(([^ ] )+[^ ]\>)(.*)/,a) ) {
        $0 = substr($0,1,RSTART-1) gensub(/([^ ]) /,"\\1","g",a[1]) a[3]
    }
    print
}' file
hello this is an examp le

You'd have to provide an algorithm explaining why le should be joined to the end of examp for that to happen.

Answer 4

I inserted a space before the last e of your examp le .
First you want to know the complete words.

echo "h e l l o this is an e x a m p l e"| sed -r 's/\w\w+/=&=/g'

result

h e l l o =this= =is= =an= e x a m p l e

Now all the isoloted characters can be removed in loop.

echo "h e l l o this is an e x a m p l e"| 
  sed -r 's/\w\w+/=&=/g;:a;s/( )([^ ])( |$)/\2\3/;ta'

result

hello =this= =is= =an=example

Next replace the equal signs with spaces and remove double spaces

echo "h e l l o this is an e x a m p l e"| 
  sed -r 's/\w\w+/=&=/g;:a;s/( )([^ ])( |$)/\2\3/;ta;s/=/ /g;s/[ ][ ]+/ /g'

The equal sign can be part of your string. When you use \r the intermediate results don't show the clear output strings, but will be better for text without \r . And when you think that an isolated I should be considered as a work, the solution is

echo "h e l l o this is an e x a m p l e that I l i k e"|
  sed -r 's/( I |\w\w+)/\r&\r/g;:a;s/( )([^ ])( |$)/\2\3/;ta; s/\r/ /g;s/[ ][ ]+/ /g'

Result:

hello this is an example that I like

Using awk can be easier:

echo "h e l l o this is an e x a m p l e that I l i k e"|
  awk '
    BEGIN { RS="[ \n]"; FS="" }
    NF==1 { printf("%s",$0); sep=" " }
    $0=="I" { printf(" ")}
    NF>1 { printf("%s%s ", sep, $0); sep =""}
    END {print ""}
  '

All "words" are moved to different lines, and when the linelength becomes 1 you don't want a space. Special rules for an isolated I . The sep is used for avoiding two spaces between two more-letter words like this is .
Result:

hello this is an example that I like

How to remove alternating whitespaces from file content

Question

4 answers

solution1
0 2021-12-27 19:01:00

solution2
0 2021-12-27 19:10:18

solution3
0 2021-12-27 20:49:22

solution4
0 2021-12-27 20:58:58

How to remove alternating whitespaces from file content

Question

4 answers

solution1 0 2021-12-27 19:01:00

solution2 0 2021-12-27 19:10:18

solution3 0 2021-12-27 20:49:22

solution4 0 2021-12-27 20:58:58

solution1
0 2021-12-27 19:01:00

solution2
0 2021-12-27 19:10:18

solution3
0 2021-12-27 20:49:22

solution4
0 2021-12-27 20:58:58