将句子列表拆分为列表中的单独单词

Question

I have a list which consists of lines as我有一个由行组成的列表

lines =  ['The query complexity of estimating weighted averages.',
     'New bounds for the query complexity of an algorithm that learns',
     'DFAs with correction equivalence queries.',
     'general procedure to check conjunctive query containment.']

I need to store it in the list as 'Separate words'我需要将它作为“单独的词”存储在列表中

lines =  ['The','query', 'complexity' ,'of' ,'estimating', 'weighted','averages.'
     ,'New' ......]

How to obtain it as a list of separate words?如何获得它作为单独的单词列表？

Answer 1

You can use a list comprehension :您可以使用列表理解：

>>> lines =  [
...     'The query complexity of estimating weighted averages.',
...     'New bounds for the query complexity of an algorithm that learns',
... ]
>>> [word for line in lines for word in line.split()]
['The', 'query', 'complexity', 'of', 'estimating', 'weighted','averages.', 'New', 'bounds', 'for', 'the', 'query', 'complexity', 'of', 'an', 'algorithm', 'that', 'learns']

Answer 2

You can join all lines and then use split() :您可以加入所有行，然后使用split() ：

" ".join(lines).split()

or you can split each line and chain:或者您可以拆分每条线和链：

from itertools import chain
list(chain(*map(str.split, lines)))

Answer 3

It sounds like you want something similar to this where a string is split based on whitespace:听起来您想要类似于以下内容的内容，其中基于空格拆分字符串：

lines[0].split()

The above would split your lines list (which seems to contain 1 item) using the whitespace in that string.以上将使用该字符串中的空格拆分您的行列表（似乎包含 1 个项目）。

Answer 4

You can do it by:你可以这样做：

import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

lines =  ['The query complexity of estimating weighted averages.',
 'New bounds for the query complexity of an algorithm that learns',
 'DFAs with correction equivalence queries.',
 'general procedure to check conjunctive query containment.']

joint_words = ' '.join(lines)

separated_words = word_tokenize(joint_words)

print(separated_words)

Output will be :输出将是：

['The', 'query', 'complexity', 'of', 'estimating', 'weighted', 'averages', '.', 'New', 'bounds', 'for', 'the', 'query', 'complexity', 'of', 'an', 'algorithm', 'that', 'learns', 'DFAs', 'with', 'correction', 'equivalence', 'queries', '.', 'general', 'procedure', 'to', 'check', 'conjunctive', 'query', 'containment', '.']

In addition, if you want to merge the dots with previous string (which appear as independent strings in the list), run the following code:此外，如果要将点与前一个字符串（在列表中显示为独立字符串）合并，请运行以下代码：

for i, j in enumerate(separated_words):
    if '.' in j:
        separated_words[i-1] = separated_words[i-1] + separated_words[i]
        del separated_words[i]    # For deleting duplicate entry

print(separated_words)

Output will be:输出将是：

['The', 'query', 'complexity', 'of', 'estimating', 'weighted', 'averages.', 'New', 'bounds', 'for', 'the', 'query', 'complexity', 'of', 'an', 'algorithm', 'that', 'learns', 'DFAs', 'with', 'correction', 'equivalence', 'queries.', 'general', 'procedure', 'to', 'check', 'conjunctive', 'query', 'containment.']

将句子列表拆分为列表中的单独单词

问题描述

4 个解决方案

解决方案1
6 2018-01-05 12:48:45

解决方案2
3 2015-03-16 01:45:21

解决方案3
0 2015-03-16 01:45:27

解决方案4
0 2018-11-07 18:02:58

将句子列表拆分为列表中的单独单词

问题描述

4 个解决方案

解决方案1 6 2018-01-05 12:48:45

解决方案2 3 2015-03-16 01:45:21

解决方案3 0 2015-03-16 01:45:27

解决方案4 0 2018-11-07 18:02:58

解决方案1
6 2018-01-05 12:48:45

解决方案2
3 2015-03-16 01:45:21

解决方案3
0 2015-03-16 01:45:27

解决方案4
0 2018-11-07 18:02:58