[英]Python: Split a string into a list, taking out all special characters except '
I need to split a string into a list of words, separating on white spaces, and deleting all special characters except for ' 我需要将字符串拆分为单词列表,在空格处分隔,并删除除“
For example: 例如:
page = "They're going up to the Stark's castle [More:...]"
needs to be turned into a list 需要变成一个清单
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
right now I can only remove all special characters using 现在我只能使用删除所有特殊字符
re.sub("[^\w]", " ", page).split()
or just split, keeping all special characters using 或仅拆分,使用
page.split()
Is there a way to specify which characters to remove, and which to keep? 有没有一种方法可以指定要删除的字符以及要保留的字符?
Use str.split
as normal, then filter the unwanted characters out of each word: 正常使用
str.split
,然后从每个单词中过滤掉不需要的字符:
>>> page = "They're going up to the Stark's castle [More:...]"
>>> result = [''.join(c for c in word if c.isalpha() or c=="'") for word in page.split()]
>>> result
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
import re
page = "They're going up to the Stark's castle [More:...]"
s = re.sub("[^\w' ]", "", page).split()
out: 出:
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
first use [\\w' ]
to match the character you need, than use ^
to match the opposite and replace wiht ''
(nothing) 首先使用
[\\w' ]
来匹配所需的字符,然后使用^
来匹配相反的字符并用''
代替(什么都没有)
Here a solution. 在这里解决。
import re
page = "They're going up to the Stark's castle [More:...]"
page = re.sub("[^0-9a-zA-Z']+", ' ', page).rstrip()
print(page)
p=page.split(' ')
print(p)
Here is the output. 这是输出。
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
Using ''.join()
and a nested list comprehension would be a simpler option in my opinion: 在我看来,使用
''.join()
和嵌套列表理解将是一个更简单的选择:
>>> page = "They're going up to the Stark's castle [More:...]"
>>> [''.join([c for c in w if c.isalpha() or c == "'"]) for w in page.split()]
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.