简体   繁体   中英

split string based on special characters in python

例如,字符串是hello %$ world %^& let me ^@ love && you预期结果将是一个变量中的 hello 而其他变量中的其余部分例如 a="hello" b="world" 等。

Use regular expression

Like this:-

import re
a = "hello %$ world %^& let me ^@ love && you"
print(re.findall(r'\w+',a))

You can user ( regular expressions to retrieve worlds from the string):

import re
my_string = "hello %$ world %^& let me ^@ love && you"
re.findall(r'\w+\b', my_string)
# ['hello', 'world', 'let', 'me', 'love', 'you']

Please see more about regular expressions in Regular Expression HOWTO

Update

As asked in comments, attaching regexp to retrieve group of words separated by special characters:

my_string = "hello world #$$ i love you #$@^ welcome to world"
re.findall(r'(\w+[\s\w]*)\b', my_string)  
# ['hello world', 'i love you', 'welcome to world']

The basic answer would be a regexp. I would recommend looking in to tokenizer from NLTK, they encompas research on the topic and give you the flexibility to switch to something more sophisticated later on. Guess what? It offers a Regexp based tokenizer too!

from nltk.tokenize import RegexpTokenizer 

tokenizer = RegexpTokenizer(r'([A-Za-z0-9 ]+)')
corpus = tokenizer.tokenize("hello %$ world %^& let me ^@ love && you")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM