[英]How do I select the first elements of each list in a list of lists?
我正在尝试使用Python / NLTK隔离一系列句子中的第一个单词。
创建了一系列不重要的句子(the_text),虽然我可以将其划分为标记化的句子,但我无法仅将每个句子的第一个单词成功地分成一个列表(first_words)。
[[“这里”,“是”,“某些”,“文本”,“。”],[“那里”,“是”,“一个”,“一个”,“人”,“在”,“该','lawn','。'],['I','am','confused','。'],['There','is','more','。'],['这里','是','一些','更多','。'],['I','do','n't','知道','任何东西,'。'],['I ','应该','添加','更多','。'],['查找',',','这里','是','更多','文本','。'],[ [如何],[伟大],[是],[那个],[?]]]
the_text="Here is some text. There is a a person on the lawn. I am confused. "
the_text= (the_text + "There is more. Here is some more. I don't know anything. ")
the_text= (the_text + "I should add more. Look, here is more text. How great is that?")
sents_tok=nltk.sent_tokenize(the_text)
sents_words=[nltk.word_tokenize(sent) for sent in sents_tok]
number_sents=len(sents_words)
print (number_sents)
print(sents_words)
for i in sents_words:
first_words=[]
first_words.append(sents_words (i,0))
print(first_words)
谢谢您的帮助!
您的代码存在三个问题,您必须修复所有三个问题才能使其正常工作:
for i in sents_words:
first_words=[]
first_words.append(sents_words (i,0))
首先,您每次都要在循环中擦除first_words
:将first_words=[]
移出循环。
其次,您将函数调用语法(括号)与索引语法(括号)混合在一起:您想要sents_words[i][0]
。
第三, for i in sents_words:
以上的元素迭代sents_words
,而不是指数。 所以你只想要i[0]
。 (或者, for i in range(len(sents_words))
,但没有理由这样做。)
因此,将其放在一起:
first_words=[]
for i in sents_words:
first_words.append(i[0])
如果您对理解有所了解,您可能会意识到这种模式(从一个空列表开始,迭代某些内容,将一些表达式附加到列表中)正是列表理解的功能:
first_words = [i[0] for i in sents_words]
如果您不这样做,那么现在要么是学习理解的好时机,要么不必担心这部分。 :)
>>> sents_words = [['Here', 'is', 'some', 'text', '.'],['There', 'is', 'a', 'a', 'person', 'on', 'the', 'lawn', '.'], ['I', 'am', 'confused', '.'], ['There', 'is', 'more', '.'], ['Here', 'is', 'some', 'more', '.'], ['I', 'do', "n't", 'know', 'anything', '.'], 'I', 'should', 'add', 'more', '.'], ['Look', ',', 'here', 'is', 'more', 'text', '.'], ['How', 'great', 'is', 'that', '?']]
您可以使用循环将其append
到先前初始化的list
:
>>> first_words = []
>>> for i in sents_words:
... first_words.append(i[0])
...
>>> print(*first_words)
Here There I There Here I I Look How
或理解(用括号将那些方括号替换为生成器):
>>> first_words = [i[0] for i in sents_words]
>>> print(*first_words)
Here There I There Here I I Look How
或者,如果不需要保存供以后使用,则可以直接打印以下项目:
>>> print(*(i[0] for i in sents_words))
Here There I There Here I I Look How
这是有关如何访问列表和列表列表中的项目的示例:
>>> fruits = ['apple','orange', 'banana']
>>> fruits[0]
'apple'
>>> fruits[1]
'orange'
>>> cars = ['audi', 'ford', 'toyota']
>>> cars[0]
'audi'
>>> cars[1]
'ford'
>>> things = [fruits, cars]
>>> things[0]
['apple', 'orange', 'banana']
>>> things[1]
['audi', 'ford', 'toyota']
>>> things[0][0]
'apple'
>>> things[0][1]
'orange'
对于您的问题:
>>> from nltk import sent_tokenize, word_tokenize
>>>
>>> the_text="Here is some text. There is a a person on the lawn. I am confused. There is more. Here is some more. I don't know anything. I should add more. Look, here is more text. How great is that?"
>>>
>>> tokenized_text = [word_tokenize(s) for s in sent_tokenize(the_text)]
>>>
>>> first_words = []
>>> # Iterates through the sentneces.
... for sent in tokenized_text:
... print sent
...
['Here', 'is', 'some', 'text', '.']
['There', 'is', 'a', 'a', 'person', 'on', 'the', 'lawn', '.']
['I', 'am', 'confused', '.']
['There', 'is', 'more', '.']
['Here', 'is', 'some', 'more', '.']
['I', 'do', "n't", 'know', 'anything', '.']
['I', 'should', 'add', 'more', '.']
['Look', ',', 'here', 'is', 'more', 'text', '.']
['How', 'great', 'is', 'that', '?']
>>> # First words in each sentence.
... for sent in tokenized_text:
... word0 = sent[0]
... first_words.append(word0)
... print word0
...
...
Here
There
I
There
Here
I
I
Look
How
>>> print first_words ['Here', 'There', 'I', 'There', 'Here', 'I', 'I', 'Look', 'How']
在具有清单理解力的单行代码中:
# From the_text, you extract the first word directly
first_words = [word_tokenize(s)[0] for s in sent_tokenize(the_text)]
# From tokenized_text
tokenized_text= [word_tokenize(s) for s in sent_tokenize(the_text)]
first_words = [w[0] for s in tokenized_text]
另一个选择,尽管它与abarnert的建议非常相似:
first_words = []
for i in range(number_sents):
first_words.append(sents_words[i][0])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.