简体   繁体   English

如何从列表中的项目中删除标点符号并将其另存为列表中的单独项目?

[英]How do I remove punctuation from an item in a list and save it as a separate item in the list?

I am trying to compress items from one list to another list and I need to be able to save punctuation as separate items in the list because if I don't, "you" and "you;" 我试图将项目从一个列表压缩到另一个列表,我需要能够将标点符号保存为列表中的单独项目,因为如果我不这样做,“你”和“你;” are saved as separate items in the list. 被保存为列表中的单独项目。

For example the original list is, 例如,原始列表是,

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President.']

and the compressed list is currently, 目前压缩列表是,

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'you', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'former', 'American', 'President.']

but I want it to have punctuation as separate items in the list. 但我希望它将标点符号作为列表中的单独项目。

My intended output is, 我的预期输出是,

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you', ';', 'ask', '!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'former', 'American', 'President', '.']

You can implement with regex . 您可以使用regex实现。

import re
a = ['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President.']
result = re.findall(r"[\w']+|[.,!?;]",' '.join(a))

Output 产量

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you', ';', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country', '!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President', '.']

Here is a demo to understand more about regex . 这是一个了解有关正则表达式的更多信息的演示。

This is the code to separete the non alphabetic characters and also remove duplicates. 这是分隔非字母字符并删除重复字符的代码。 hope it helps. 希望能帮助到你。

def separate(mylist):
    newlist = [] 
    test = ''
    a = ''
    for e in mylist:
        for c in e:   
            if not c.isalpha():
                a = c
            else:
                test = test + c
        if a != '':
            newlist = newlist + [test] + [a]
        else:
            newlist = newlist + [test]
        test = ''
        a = ''
    noduplicates = []
    for i in newlist:
        if i not in noduplicates:
            noduplicates = noduplicates + [i]
    return noduplicates

I`m sure someone else can do better couse this is a bit messy but at least works. 我相信别人可以做得更好,这有点乱,但至少有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM