简体   繁体   中英

Python - Strip Punctuation from list of words using re.sub and string.punctuation

I am trying to remove punctuation from the list of string.punctuation in a list of words. The issue is that I do not know where to strip the punctuation as I am dealing with a dictionary within a dictionary. My code is below

from collections import Counter
import re
comments = []
ar_lst = []

for review in reviews:
    ar_dict = {}
    ar_dict["Comments"] = review["Content"]
    ar_dict["Author"] = review["Author"]
    ar_lst.append(ar_dict)
    
for review in ar_lst:
    # TODO: (1) Get the number of words in the current review variable.
    punc= string.punctuation
    comments = review['Comments'].lower()
    author = review['Author']
    unique_words_count = set()
    all_words = comments.split(" ")
    for word in all_words:
        unique_words_count.add(word)
# (2) Print the author's name and the number of (unique) words in his/her review 
    print(f'{author} used {len(unique_words_count)} unique words.')

The output I am getting is below

在此处输入图像描述

But I need the output to look like this

在此处输入图像描述

The reason the # of words is off is due to the fact that I can't figure out where to insert the re.sub() expression. I tried putting it into the second 'for-loop' as

comments = re.sub(punc, '', review['Comments']).lower()

But this did not work. Any help would be greatly appreciated!

Also, this is a snippet of what the dictionary looks like

在此处输入图像描述

You can either strip out the punctuation from comments before you split into it words (preferable), or you can strip it from word in the loop for word in all_words: . string.punctuation is a string ."#$%&'... but you probably want the character set:

punc = '[%s]' % string.punctuation.replace(']', '\]')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM