[英]Split more than one word in python
How can I write a program in python that can split more than one word or character? 如何用python编写一个可以拆分多个单词或字符的程序? For example I have these sentences:
Hi, This is a test. Are you surprised?
例如,我有这些句子:
Hi, This is a test. Are you surprised?
Hi, This is a test. Are you surprised?
In this example i need my program to split these sentences by ',','!','?' 在此示例中,我需要我的程序将这些句子分隔为',','!','?' and '.'.
和'。'。 I know split in
str
library and NLTK
but I need to know is there any internal pythonic way like split? 我知道
str
库和NLTK
split,但我需要知道是否有任何内部pythonic方式(例如split)?
Use re.split: 使用re.split:
string = 'Hi, This is a test. Are you surprised?'
words = re.split('[,!?.]', string)
print(words)
[u'Hi', u' This is a test', u' Are you surprised', u'']
You are looking for the tokenize
function of NLTK package. 您正在寻找NLTK软件包的
tokenize
功能。 NLTK
stands for Natural Language Tool Kit NLTK
代表自然语言工具包
Or try re.split
from re
module. 或尝试从
re
模块re
re.split
。
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
['0', '3', '9']
I think I found a tricky way for my question. 我想我找到了一个棘手的方法。 I don't need to use any modules for that.
我不需要为此使用任何模块。 I can use
replace
method of str library and replace words like !
我可以使用str库的
replace
方法并替换类似的单词!
or ?
还是
?
with .
与
.
. 。 Then I can use
split
method for my text to split word by .
然后,我可以使用
split
方法对文本进行逐字分割.
. 。
def get_words(s):
l = []
w = ''
for c in s:
if c in '-!?,. ':
if w != '':
l.append(w)
w = ''
else:
w = w + c
if w != '':
l.append(w)
return l
>>> s = "Hi, This is a test. Are you surprised?"
>>> print get_words(s)
['Hi', 'This', 'is', 'a', 'test', 'Are', 'you', 'surprised']
If you change '-!?,. ' into '-!?,.'
The output will be:
['Hi', ' This is a test', ' Are you surprised']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.