繁体   English   中英

Python:将字符串拆分为列表,取出除“

[英]Python: Split a string into a list, taking out all special characters except '

我需要将字符串拆分为单词列表,在空格处分隔,并删除除“

例如:

page = "They're going up to the Stark's castle [More:...]"

需要变成一个清单

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

现在我只能使用删除所有特殊字符

re.sub("[^\w]", " ", page).split()

或仅拆分,使用

page.split() 

有没有一种方法可以指定要删除的字符以及要保留的字符?

正常使用str.split ,然后从每个单词中过滤掉不需要的字符:

>>> page = "They're going up to the Stark's castle [More:...]"
>>> result = [''.join(c for c in word if c.isalpha() or c=="'") for word in page.split()]
>>> result
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
import re

page = "They're going up to the Stark's castle [More:...]"
s = re.sub("[^\w' ]", "", page).split()

出:

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

首先使用[\\w' ]来匹配所需的字符,然后使用^来匹配相反的字符并用''代替(什么都没有)

在这里解决。

  1. 用空格代替字母数字和单引号字符以外的所有字符,并删除所有尾随空格。
  2. 现在使用SPACE作为分隔符分割字符串。


import re

page = "They're going up to the Stark's castle   [More:...]"
page = re.sub("[^0-9a-zA-Z']+", ' ', page).rstrip()
print(page)
p=page.split(' ')
print(p)


这是输出。

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

在我看来,使用''.join()和嵌套列表理解将是一个更简单的选择:

>>> page = "They're going up to the Stark's castle [More:...]"
>>> [''.join([c for c in w if c.isalpha() or c == "'"]) for w in page.split()]
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
>>> 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM