如何按空格分割字符串并将特殊字符视为Python中的单独单词？

Question

假设我有一个字符串，

"I want that one, it is great."

我想拆分这个字符串

["I", "want", "that", "one", ",", "it", "is", "great", "."]

保留特殊字符，例如",.:;" 可能还有其他的被视为一个单独的词。

使用Python 2.7有什么简单的方法吗？

更新

例如"I don't." ，它应该是["I", "don", "'", "t", "."] 。 理想情况下，它可以与非英语标点符号一起使用，例如؛和其他标点符号。

Answer 1

在这里查看类似的问题。 答案也适用于你：

import re
print re.split('(\W)', "I want that one, it is great.")
print re.split('(\W)', "I don't.")

您可以使用过滤器删除re.split返回的空格和空字符串：

s = "I want that one, it is great."
print filter(lambda _: _ not in [' ', ''], re.split('(\W)', s))

Answer 2

您可以使用Regex和简单的列表理解来完成此操作。 正则表达式将拉出单词并分隔标点符号，列表理解将删除空格。

import re
s = "I want that one, it is great. Don't do it."
new_s = [c.strip() for c in re.split('(\W+)', s) if c.strip() != '']
print new_s

new_s的输出将是：

['I', 'want', 'that', 'one', ',', 'it', 'is', 'great', '.', 'Don', "'", 't', 'do', 'it', '.']

Answer 3

In [70]: re.findall(r"[^,.:;' ]+|[,.:;']", "I want that one, it is great.")
Out[70]: ['I', 'want', 'that', 'one', ',', 'it', 'is', 'great', '.']

In [76]: re.findall(r"[^,.:;' ]+|[,.:;']", "I don't.")
Out[76]: ['I', 'don', "'", 't', '.']

正则表达式[^,.:;' ]+|[,.:;'] [^,.:;' ]+|[,.:;']匹配（1-或-多于其他字符, ， . ， : ， ; ， '或文字的空间），或（文字字符, ， . ， : ， ;或' ）。

或者，使用正则表达式模块，您可以使用[:punct:]字符类轻松扩展它以包括所有标点符号和符号：

In [77]: import regex

在Python2中：

In [4]: regex.findall(ur"[^[:punct:] ]+|[[:punct:]]", u"""A \N{ARABIC SEMICOLON} B""")
Out[4]: [u'A', u'\u061b', u'B']

In [6]: regex.findall(ur"[^[:punct:] ]+|[[:punct:]]", u"""He said, "I don't!" """)
Out[6]: [u'He', u'said', u',', u'"', u'I', u'don', u"'", u't', u'!', u'"']

在Python3中：

In [105]: regex.findall(r"[^[:punct:] ]+|[[:punct:]]", """A \N{ARABIC SEMICOLON} B""")
Out[105]: ['A', '؛', 'B']

In [83]: regex.findall(r"[^[:punct:] ]+|[[:punct:]]", """He said, "I don't!" """)
Out[83]: ['He', 'said', ',', '"', 'I', 'don', "'", 't', '!', '"']

请注意，如果希望[:punct:]匹配unicode标点符号或符号，则必须将unicode作为第二个参数传递给regex.findall 。

在Python2中：

import regex
print(regex.findall(r"[^[:punct:] ]+|[[:punct:]]", 'help؛'))
print(regex.findall(ur"[^[:punct:] ]+|[[:punct:]]", u'help؛'))

版画

['help\xd8\x9b']
[u'help', u'\u061b']

Answer 4

我不知道任何可以执行此操作的函数，但您可以使用for循环。

像这样：word =“”wordLength = 0 for i in range（0，len（stringName））：if stringName [i]！=“”：for x in range（（i-wordLength），i）：word + = stringName [i] wordLength = 0 list.append（word）word =“”else：worldLength = wordlength + 1希望这有效...对不起，如果不是最好的方式

如何按空格分割字符串并将特殊字符视为Python中的单独单词？

问题描述

更新

4 个解决方案

解决方案1
1 2016-05-25 18:52:50

解决方案2
1 2016-05-25 18:53:24

解决方案3
1 已采纳 2016-05-25 18:58:49

解决方案4
0 2016-05-25 18:54:52

如何按空格分割字符串并将特殊字符视为Python中的单独单词？

问题描述

更新

4 个解决方案

解决方案1 1 2016-05-25 18:52:50

解决方案2 1 2016-05-25 18:53:24

解决方案3 1 已采纳 2016-05-25 18:58:49

解决方案4 0 2016-05-25 18:54:52

解决方案1
1 2016-05-25 18:52:50

解决方案2
1 2016-05-25 18:53:24

解决方案3
1 已采纳 2016-05-25 18:58:49

解决方案4
0 2016-05-25 18:54:52