So, I am trying to figure out the best way to go about taking a list of strings and converting it into a list of words. I also want to strip all the punctuation from the string. My thought process is to do the following:
This seems like a large number of steps to go from a list of strings to a list of words. Does anyone have a more concise method or can make suggestions to my process? The end-goal is to pass the list of strings into the counter class to find the most common word(s).
Below are the current output and the desired output.
list_of_strings = ['This is string one.', 'This is string two.', 'This is string three.'] # current output
list_of_words = ['This', 'is', 'string', 'one', 'This', 'is', 'string', 'two', 'This', 'is', 'string', 'three'] # desired output
First for-loop extract one line at a time. ex:-
list_of_strings[0] = 'This is string one';
Then word = line.split()
, here split()
splits line into words by delimiter=(space)
Second for-loop appends or adds all the split words to list_of_words array.
list_of_strings = ['This is string one.', 'This is string two.', 'This is string three.']
list_of_words = list()
for line in list_of_strings:
word = line.split()
for i in word:
list_of_words.append(i)
print(list_of_words)
You can try this.
list_of_words = [j.strip('.') for i in list_of_strings for j in i.split()]
You can try like this ( rstrip
the string from .
rather than a strip
and split around white-spaces, join the lists obtained after splitting via sum
):
>>> sum([i.rstrip(".").split(" ") for i in list_of_strings], [])
['This', 'is', 'string', 'one', 'This', 'is', 'string', 'two', 'This', 'is', 'string', 'three']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.