简体   繁体   中英

Split group of special characters from string

In test.txt:

quiet confidence^_^
want:P
(:let's start

Codes:

import re
file  = open('test.txt').read()
for line in file.split('\n'):
    line = re.findall(r"[^\w\s$]+|[a-zA-z]+|[^\w\s$]+", line)
    print " ".join(line)

Results showed:

quiet confidence^_^
want : P
(: let ' s start

I tried to separate group of special characters from string but still incorrect. Any suggestion?

Expected results:

quiet confidence ^_^
want :P
(: let's start

as @interjay said, you must define what you consider a word and what is "special characters". Still I would use 2 separate regexes to find what a word is and what is not.

word = re.compile("[a-zA-Z\']+")
not_word = re.compile("[^a-zA-Z\']+")

for line in file.split('\n'):
    matched_words = re.findall(word, line)
    non_matching_words = re.findall(not_word, line)
    print " ".join(matched_words)
    print " ".join(non_matching_words)

Have in mind that spaces \\s+ will be grouped as non words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM