简体   繁体   English

使用python将句子列表拆分为单词列表

[英]split a list of sentences to a list of words with python

I have a list of sentence 我有一个句子清单

s: 'hello everyone', 'how are you',..., 'i am fine'.

I would like to split this list of sentences to a list of words. 我想将此句子列表拆分为单词列表。

So my expected result: 所以我的预期结果是:

[['hello', 'everyone'], ['how', 'are', 'you'], .., ['i', 'am', 'fine]] 

I try like this : 我这样尝试:

def split_list(sentence):
    for s in sentence:
        s=s.split()
    return s

but i got one list of words, not a list of list of words. 但是我得到了一个单词列表,而不是单词列表。

['hello', 'everyone', 'how', 'are', 'you', .., 'i', 'am', 'fine]

It's not very clear on what sentence refers to in your function split_list , but if it is a list of strings like ['hello everyone', 'how are you', 'i am fine'] , you end up overwriting the same string s on every iteration, and end up getting the result of the last iteration, ie ['i', 'am', 'fine'] 在函数split_list所指的sentence尚不十分清楚,但是如果它是一个字符串列表,例如['hello everyone', 'how are you', 'i am fine'] ,则最终将覆盖相同的字符串s每次迭代,最后得到最后一次迭代的结果,即['i', 'am', 'fine']

So you need to ensure that you are collecting all your results in a list of lists and returning that. 因此,您需要确保将所有结果收集在列表列表中并返回。

You can do that list-comprehension like so, assuming it is a list of strings like above 您可以像这样进行列表理解,假设它是上面的字符串列表

s = ['hello everyone', 'how are you', 'i am fine']

def split_list(sentence):
    # Split each sentence in the list, and append to result list
    return [item.split() for item in sentence]

print(split_list(s))

Or a normal for loop 或正常的for循环

s = ['hello everyone', 'how are you', 'i am fine']

def split_list(sentence):
    result = []
    #Split each sentence in the list, and append to result list
    for s in sentence:
        result.append(s.split())
    return result

print(split_list(s))

The output will be same for both cases. 两种情况下的输出将相同。

[['hello', 'everyone'], ['how', 'are', 'you'], ['i', 'am', 'fine']]

This can just be done with a list comprehension. 这可以通过列表理解来完成。

s = ['hello everyone', 'how are you', 'i am fine']
s2 = [c.split() for c in s]
print(s2) # [['hello', 'everyone'], ['how', 'are', 'you'], ['i', 'am', 'fine']]

You have to save the result of each iteration in a list by initializing an empty list before the loop and appending each result in the loop: 您必须通过在循环之前初始化一个空列表并将每个结果附加到循环中来将每次迭代的结果保存在列表中:

def split_list(sentence):
    L = []
    for s in sentence:
        L.append(s.split())
    return L

Otherwise the function will return only the result of the last iteration. 否则,该函数将仅返回上一次迭代的结果。

from nltk import word_tokenize
s = ['hello everyone', 'how are you', 'i am fine']

token = [word_tokenize(x) for x in s]
print(token)

o/p
[['hello', 'everyone'], ['how', 'are', 'you'], ['i', 'am', 'fine']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM