I am trying to remove punctuation from the list of string.punctuation in a list of words. The issue is that I do not know where to strip the punctuation as I am dealing with a dictionary within a dictionary. My code is below
from collections import Counter
import re
comments = []
ar_lst = []
for review in reviews:
ar_dict = {}
ar_dict["Comments"] = review["Content"]
ar_dict["Author"] = review["Author"]
ar_lst.append(ar_dict)
for review in ar_lst:
# TODO: (1) Get the number of words in the current review variable.
punc= string.punctuation
comments = review['Comments'].lower()
author = review['Author']
unique_words_count = set()
all_words = comments.split(" ")
for word in all_words:
unique_words_count.add(word)
# (2) Print the author's name and the number of (unique) words in his/her review
print(f'{author} used {len(unique_words_count)} unique words.')
The output I am getting is below
But I need the output to look like this
The reason the # of words is off is due to the fact that I can't figure out where to insert the re.sub() expression. I tried putting it into the second 'for-loop' as
comments = re.sub(punc, '', review['Comments']).lower()
But this did not work. Any help would be greatly appreciated!
Also, this is a snippet of what the dictionary looks like
You can either strip out the punctuation from comments
before you split into it words (preferable), or you can strip it from word
in the loop for word in all_words:
. string.punctuation
is a string ."#$%&'...
but you probably want the character set:
punc = '[%s]' % string.punctuation.replace(']', '\]')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.