简体   繁体   中英

List of strings to list of words

So, I am trying to figure out the best way to go about taking a list of strings and converting it into a list of words. I also want to strip all the punctuation from the string. My thought process is to do the following:

  1. Make one big string of the list of strings using the.join() method and list comprehension/map.
  2. Use the string translate method to remove the punctuation.
  3. Use the split method to split the gigantic string back into a list.

This seems like a large number of steps to go from a list of strings to a list of words. Does anyone have a more concise method or can make suggestions to my process? The end-goal is to pass the list of strings into the counter class to find the most common word(s).

Below are the current output and the desired output.

list_of_strings = ['This is string one.', 'This is string two.', 'This is string three.'] # current output
list_of_words = ['This', 'is', 'string', 'one', 'This', 'is', 'string', 'two', 'This', 'is', 'string', 'three'] # desired output

First for-loop extract one line at a time. ex:-

list_of_strings[0] = 'This is string one';

Then word = line.split() , here split() splits line into words by delimiter=(space)

Second for-loop appends or adds all the split words to list_of_words array.

list_of_strings = ['This is string one.', 'This is string two.', 'This is string three.']
list_of_words = list()

for line in list_of_strings:
    word = line.split()

    for i in word:
        list_of_words.append(i)

print(list_of_words)

You can try this.

list_of_words = [j.strip('.') for i in list_of_strings for j in i.split()]

You can try like this ( rstrip the string from . rather than a strip and split around white-spaces, join the lists obtained after splitting via sum ):

>>> sum([i.rstrip(".").split(" ") for i in list_of_strings], [])
['This', 'is', 'string', 'one', 'This', 'is', 'string', 'two', 'This', 'is', 'string', 'three']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM