简体   繁体   English

根据子列表的前两项过滤列表列表-使用NLTK进行自然语言处理

[英]Filter a list of lists based on the first two items of the sublist - natural language processing with NLTK

I have generated a list of trigrams and their frequencies in NLTK with this code 我用此代码在NLTK中生成了三字母组及其频率列表

tokens = nltk.wordpunct_tokenize(docs)
from nltk.collocations import *
trigram_measures = nltk.collocations.TrigramAssocMeasures()
finderT = TrigramCollocationFinder.from_words(tokens)
scoredT = finderT.score_ngrams(trigram_measures.raw_freq)

Given a user defined 'input' of two words, I want to filer the list scoredT to return those values where the input matches the first two items of the sub list in scoredT 给定用户定义的两个单词的“输入”,我想对列表scoreT进行过滤,以返回输入与scoredT中子列表的前两项匹配的值

scoredT looks like this scoredT看起来像这样

[(('out', 'to', 'the'), 2.7147650642313413e-05),
(('proud', 'of', 'you'), 2.7147650642313413e-05)]

So if input were equal to 'out to', Id like to filter the list to return 'the' 因此,如果输入等于“ out to”,则想过滤列表以返回“ the”

I tried 我试过了

matches = filter(scoredT[0:len(scoredT)][0:1]==input, scoredT)

but get the following error TypeError: 'bool' object is not callable 但出现以下错误TypeError:'bool'对象不可调用

scoredT[0:len(scoredT)][0:1]==input compares the first element of scoredT to input . scoredT[0:len(scoredT)][0:1]==input第一个元素进行比较scoredTinput So it will be boolean. 因此它将是布尔值。 Then you pass it to filter , which requires the first argument to be a boolean valued function , hence your error. 然后将其传递给filter ,它要求第一个参数是布尔值函数 ,因此会出错。 The pythonic way: pythonic方式:

matches = [(trigram, score) for (trigram, score) in scoredT if trigram[:2] == input]

Also you need to make sure that input is a tuple. 另外,您还需要确保input是一个元组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM