Python - Strip Punctuation from list of words using re.sub and string.punctuation

Question

I am trying to remove punctuation from the list of string.punctuation in a list of words. The issue is that I do not know where to strip the punctuation as I am dealing with a dictionary within a dictionary. My code is below

from collections import Counter
import re
comments = []
ar_lst = []

for review in reviews:
    ar_dict = {}
    ar_dict["Comments"] = review["Content"]
    ar_dict["Author"] = review["Author"]
    ar_lst.append(ar_dict)
    
for review in ar_lst:
    # TODO: (1) Get the number of words in the current review variable.
    punc= string.punctuation
    comments = review['Comments'].lower()
    author = review['Author']
    unique_words_count = set()
    all_words = comments.split(" ")
    for word in all_words:
        unique_words_count.add(word)
# (2) Print the author's name and the number of (unique) words in his/her review 
    print(f'{author} used {len(unique_words_count)} unique words.')

The output I am getting is below

But I need the output to look like this

The reason the # of words is off is due to the fact that I can't figure out where to insert the re.sub() expression. I tried putting it into the second 'for-loop' as

comments = re.sub(punc, '', review['Comments']).lower()

But this did not work. Any help would be greatly appreciated!

Also, this is a snippet of what the dictionary looks like

Answer 1

You can either strip out the punctuation from comments before you split into it words (preferable), or you can strip it from word in the loop for word in all_words: . string.punctuation is a string ."#$%&'... but you probably want the character set:

punc = '[%s]' % string.punctuation.replace(']', '\]')

Python - Strip Punctuation from list of words using re.sub and string.punctuation

Question

1 answers

solution1
1 2021-03-01 03:22:58

Python - Strip Punctuation from list of words using re.sub and string.punctuation

Question

1 answers

solution1 1 2021-03-01 03:22:58

solution1
1 2021-03-01 03:22:58