[英]How do I remove punctuation from an item in a list and save it as a separate item in the list?
I am trying to compress items from one list to another list and I need to be able to save punctuation as separate items in the list because if I don't, "you" and "you;" 我试图将项目从一个列表压缩到另一个列表,我需要能够将标点符号保存为列表中的单独项目,因为如果我不这样做,“你”和“你;” are saved as separate items in the list.
被保存为列表中的单独项目。
For example the original list is, 例如,原始列表是,
['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President.']
and the compressed list is currently, 目前压缩列表是,
['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'you', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'former', 'American', 'President.']
but I want it to have punctuation as separate items in the list. 但我希望它将标点符号作为列表中的单独项目。
My intended output is, 我的预期输出是,
['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you', ';', 'ask', '!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'former', 'American', 'President', '.']
You can implement with regex
. 您可以使用
regex
实现。
import re
a = ['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President.']
result = re.findall(r"[\w']+|[.,!?;]",' '.join(a))
Output 产量
['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you', ';', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country', '!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President', '.']
Here is a demo to understand more about regex . 这是一个了解有关正则表达式的更多信息的演示。
This is the code to separete the non alphabetic characters and also remove duplicates. 这是分隔非字母字符并删除重复字符的代码。 hope it helps.
希望能帮助到你。
def separate(mylist):
newlist = []
test = ''
a = ''
for e in mylist:
for c in e:
if not c.isalpha():
a = c
else:
test = test + c
if a != '':
newlist = newlist + [test] + [a]
else:
newlist = newlist + [test]
test = ''
a = ''
noduplicates = []
for i in newlist:
if i not in noduplicates:
noduplicates = noduplicates + [i]
return noduplicates
I`m sure someone else can do better couse this is a bit messy but at least works. 我相信别人可以做得更好,这有点乱,但至少有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.