Python regex to match “spaced out” words

Question

When dealing with text files that have been produced using optical character recognition (OCR) I often come across lines or parts of lines

t h a t  a r e  s p a c e d  o u t  l i k e  t h i s.

I would like to be able to use a regular expression to match these words and smash the letters back together. But I have no idea how to do this using capture groups or my usual toolbox of regular expression knowledge.

Answer 1

那可能是您要寻找的：

re.sub(r' (.)', r'\1', txt)

Answer 2

(?<=\s\s|^)((?:\w\s|\w\.)+)

This will work.

See Demo

Answer 3

尝试这个：

re.sub(r' \b', r'', txt)

Python regex to match “spaced out” words

Question

3 answers

solution1
2 ACCPTED 2014-09-02 04:23:44

solution2
1 2014-09-02 04:43:13

solution3
1 2014-09-02 05:27:42

Python regex to match “spaced out” words

Question

3 answers

solution1 2 ACCPTED 2014-09-02 04:23:44

solution2 1 2014-09-02 04:43:13

solution3 1 2014-09-02 05:27:42

solution1
2 ACCPTED 2014-09-02 04:23:44

solution2
1 2014-09-02 04:43:13

solution3
1 2014-09-02 05:27:42