"She's so nice!" -> ["she","'","s","so","nice","!"] I want to split sentence like this! so I wrote the code, but It includes white space! How to make code only using regular expression?
words = re.findall('\W+|\w+')
-> ["she", "'","s", " ", "so", " ", "nice", "!"]
words = [word for word in words if not word.isspace()]
Regex : [A-Za-z]+|[^A-Za-z ]
In [^A-Za-z ]
add chars you don't want to match.
Details:
[]
Match a single character present in the list [^]
Match a single character NOT present in the list +
Matches between one and unlimited times |
Or Python code :
text = "She's so nice!"
matches = re.findall(r'[A-Za-z]+|[^A-Za-z ]', text)
Output:
['She', "'", 's', 'so', 'nice', '!']
Python's re
module doesn't allow you to split on zero-width assertions. You can use python's pypi regex
package instead (ensuring you specify to use version 1, which properly handles zero-width matches).
import regex
s = "She's so nice!"
x = regex.split(r"\s+|\b(?!^|$)", s, flags=regex.VERSION1)
print(x)
Output: ['She', "'", 's', 'so', 'nice', '!']
\\s+|\\b(?!^|$)
Match either of the following options
\\s+
Match one or more whitespace characters \\b(?!^|$)
Assert position as a word boundary, but not at the beginning or end of the line
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.