[英]How to split sentence to words with regular expression?
"She's so nice!" “她真好!” -> ["she","'","s","so","nice","!"] I want to split sentence like this!
-> [“ she”,“'”,“ s”,“ so”,“ nice”,“!”]我想这样分割句子! so I wrote the code, but It includes white space!
所以我写了代码,但是它包含空格! How to make code only using regular expression?
如何仅使用正则表达式制作代码?
words = re.findall('\W+|\w+')
-> ["she", "'","s", " ", "so", " ", "nice", "!"] -> [“ she”,“'”,“ s”,“”,“ so”,“”,“ nice”,“!”]
words = [word for word in words if not word.isspace()]
Regex : [A-Za-z]+|[^A-Za-z ]
正则表达式 :
[A-Za-z]+|[^A-Za-z ]
In [^A-Za-z ]
add chars you don't want to match. 在
[^A-Za-z ]
添加您不想匹配的字符。
Details: 细节:
[]
Match a single character present in the list []
匹配列表中存在的单个字符 [^]
Match a single character NOT present in the list [^]
匹配列表中不存在的单个字符 +
Matches between one and unlimited times +
无限次匹配 |
Or Python code : Python代码 :
text = "She's so nice!"
matches = re.findall(r'[A-Za-z]+|[^A-Za-z ]', text)
Output: 输出:
['She', "'", 's', 'so', 'nice', '!']
Python's re
module doesn't allow you to split on zero-width assertions. Python的
re
模块不允许您拆分零宽度的断言。 You can use python's pypi regex
package instead (ensuring you specify to use version 1, which properly handles zero-width matches). 您可以改用python的pypi
regex
包 (确保您指定使用版本1,该版本可以正确处理零宽度匹配)。
See code in use here 在这里查看正在使用的代码
import regex
s = "She's so nice!"
x = regex.split(r"\s+|\b(?!^|$)", s, flags=regex.VERSION1)
print(x)
Output: ['She', "'", 's', 'so', 'nice', '!']
输出:
['She', "'", 's', 'so', 'nice', '!']
\\s+|\\b(?!^|$)
Match either of the following options \\s+|\\b(?!^|$)
匹配以下任一选项
\\s+
Match one or more whitespace characters \\s+
匹配一个或多个空格字符 \\b(?!^|$)
Assert position as a word boundary, but not at the beginning or end of the line \\b(?!^|$)
位置为单词边界,但不在行的开头或结尾
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.