Python：将字符串拆分为列表，取出除“

Question

I need to split a string into a list of words, separating on white spaces, and deleting all special characters except for ' 我需要将字符串拆分为单词列表，在空格处分隔，并删除除“

For example: 例如：

page = "They're going up to the Stark's castle [More:...]"

needs to be turned into a list 需要变成一个清单

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

right now I can only remove all special characters using 现在我只能使用删除所有特殊字符

re.sub("[^\w]", " ", page).split()

or just split, keeping all special characters using 或仅拆分，使用

page.split()

Is there a way to specify which characters to remove, and which to keep? 有没有一种方法可以指定要删除的字符以及要保留的字符？

Answer 1

Use str.split as normal, then filter the unwanted characters out of each word: 正常使用str.split ，然后从每个单词中过滤掉不需要的字符：

>>> page = "They're going up to the Stark's castle [More:...]"
>>> result = [''.join(c for c in word if c.isalpha() or c=="'") for word in page.split()]
>>> result
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

Answer 2

import re

page = "They're going up to the Stark's castle [More:...]"
s = re.sub("[^\w' ]", "", page).split()

out: 出：

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

first use [\\w' ] to match the character you need, than use ^ to match the opposite and replace wiht '' (nothing) 首先使用[\\w' ]来匹配所需的字符，然后使用^来匹配相反的字符并用''代替（什么都没有）

Answer 3

Here a solution. 在这里解决。

replace all chars other than alpha-numeric and single quote characters with SPACE and remove any trailing spaces. 用空格代替字母数字和单引号字符以外的所有字符，并删除所有尾随空格。
Now split the string using SPACE as delimiter. 现在使用SPACE作为分隔符分割字符串。

import re

page = "They're going up to the Stark's castle   [More:...]"
page = re.sub("[^0-9a-zA-Z']+", ' ', page).rstrip()
print(page)
p=page.split(' ')
print(p)

Here is the output. 这是输出。

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

Answer 4

Using ''.join() and a nested list comprehension would be a simpler option in my opinion: 在我看来，使用''.join()和嵌套列表理解将是一个更简单的选择：

>>> page = "They're going up to the Stark's castle [More:...]"
>>> [''.join([c for c in w if c.isalpha() or c == "'"]) for w in page.split()]
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
>>>

Python：将字符串拆分为列表，取出除“

问题描述

4 个解决方案

解决方案1
1 2016-12-12 00:19:59

解决方案2
0 2016-12-12 00:32:00

解决方案3
0 2016-12-12 02:57:46

解决方案4
-1 2016-12-12 00:22:28

Python：将字符串拆分为列表，取出除“

问题描述

4 个解决方案

解决方案1 1 2016-12-12 00:19:59

解决方案2 0 2016-12-12 00:32:00

解决方案3 0 2016-12-12 02:57:46

解决方案4 -1 2016-12-12 00:22:28

解决方案1
1 2016-12-12 00:19:59

解决方案2
0 2016-12-12 00:32:00

解决方案3
0 2016-12-12 02:57:46

解决方案4
-1 2016-12-12 00:22:28