简体   繁体   English

在python中的列表列表中计数元素

[英]Counting elements in a list of lists in python

Hope u can help me w/ this python function: 希望你能帮我w /这个python函数:

def comparapal(lista):#lista is a list of lists where each list has 4 elements
  listaPalabras=[]
  for item in lista:
     if item[2] in eagles_dict.keys():# filter the list if the 3rd element corresponds to the key in the dictionary
        listaPalabras.append([item[1],item[2]]) #create a new list with elements 2 and 3

The listaPalabras result: listaPalabras结果:

[
   ['bien', 'NP00000'],
   ['gracia', 'NCFP000'],
   ['estar', 'VAIP1S0'],
   ['bien', 'RG'],
   ['huevo', 'NCMS000'],
   ['calcio', 'NCMS000'],
   ['leche', 'NCFS000'],
   ['proteina', 'NCFS000'],
   ['francisco', 'NP00000'],
   ['ya', 'RG'],
   ['ser', 'VSIS3S0'],
   ['cosa', 'NCFS000']
]

My question is: How can I compare the 1st element of each list so that if the word is the same, compare their tags which is the 2nd element. 我的问题是:如何比较每个列表的第一个元素,以便如果单词相同,则比较它们的标签,即第二个元素。

Sorry for being ambiguous, the fuunction has to return a list of lists w/ 3 elements: the word, the tag and the number of occurrences of each word. 很抱歉,函数不明确,必须返回包含3个元素的列表列表:单词,标签和每个单词的出现次数。 But in order to count the words I need to compare the word w/ others and if there exists 2 or more words alike, then compare the tags to chk the difference. 但是为了计数单词,我需要比较带有其他单词的单词,如果存在两个或更多单词,则比较标签以找出差异。 If the tags are different then count the words separately. 如果标签不同,则分别计算单词。

result -> [['bien', 'NP00000',1],['bien', 'RG',1]] -> two same words but counted separately by the comparison of the tags Thanks in advance: 结果-> [[['bien','NP00000',1],['bien','RG',1]]->两个相同的单词,但是通过标签比较将它们分别计数

import collections
inlist = [
   ['bien', 'NP00000'],
   ['gracia', 'NCFP000'],
   ['estar', 'VAIP1S0'],
   ['bien', 'RG'],
   ['huevo', 'NCMS000'],
   ['calcio', 'NCMS000'],
   ['leche', 'NCFS000'],
   ['proteina', 'NCFS000'],
   ['francisco', 'NP00000'],
   ['ya', 'RG'],
   ['ser', 'VSIS3S0'],
   ['cosa', 'NCFS000']
]
[(a,b,v) for (a,b),v in collections.Counter(map(tuple,inlist)).iteritems()]
#=>[('proteina', 'NCFS000', 1), ('francisco', 'NP00000', 1), ('ser', 'VSIS3S0', 1), ('bien', 'NP00000', 1), ('calcio', 'NCMS000', 1), ('estar', 'VAIP1S0', 1), ('huevo', 'NCMS000', 1), ('gracia', 'NCFP000', 1), ('bien', 'RG', 1), ('cosa', 'NCFS000', 1), ('ya', 'RG', 1), ('leche', 'NCFS000', 1)]

You want to count the number of occurrences of each pair. 您要计算每对出现的次数。 The counter expression does that. counter表达式可以做到这一点。 The list comprehension formats this as triples. 列表理解将其格式化为三元组。

What specific output do you need? 您需要什么具体输出? I don't know what exactly do you need to do, but if you want to group items related to same word, you can turn this structure into dictionary and manipulate it later 我不知道您到底需要做什么,但是如果您想将与同一个单词相关的项目分组,则可以将此结构转换为字典并稍后进行操作

>>> new = {}
>>> for i,j in a: # <-- a = listaPalabras 
        if new.get(i) == None:
                new[i] = [j]
        else:
                new[i].append(j)

which will give us: 这将给我们:

{'francisco': ['NP00000'], 'ser': ['VSIS3S0'], 'cosa': ['NCFS000'], 'ya': ['RG'], 'bien': ['NP00000', 'RG'], 'estar': ['VAIP1S0'], 'calcio': ['NCMS000'], 'leche': ['NCFS000'], 'huevo': ['NCMS000'], 'gracia': ['NCFP000'], 'proteina': ['NCFS000']}

and then later on you can do: 然后可以执行以下操作:

>>> for i in new:
        if len(new[i]) > 1:
                print "compare {this} and {that}".format(this=new[i][0],that=new[i][1])

will print: 将打印:

compare NP00000 and RG #for key bien

EDIT: In the first step, you can also use defaultdict, as suggested by Marcin in the comment, this would look like this: 编辑:在第一步中,您也可以使用defaultdict,如Marcin在评论中所建议的,这看起来像这样:

>>> d = defaultdict(list)
>>> for i,j in a:
        d.setdefault(i,[]).append(j)

EDIT2 (answer to OP's comment) EDIT2(对OP评论的回答)

for i in d:
    item = []
    item.append(i)
    item.extend(d[i])
    item.append(len(d[i]))
    result.append(item)

This gives us: 这给我们:

[['francisco', 'NP00000', 1], ['ser', 'VSIS3S0', 1], ['cosa', 'NCFS000', 1], ['ya', 'RG', 1], ['bien', 'NP00000', 'RG', 2], ['estar', 'VAIP1S0', 1], ['calcio', 'NCMS000', 1], ['leche', 'NCFS000', 1], ['huevo', 'NCMS000', 1], ['gracia', 'NCFP000', 1], ['proteina', 'NCFS000', 1]]

A purely list-based solution is possible of course, but requires additional looping. 纯粹基于列表的解决方案当然是可能的,但是需要附加的循环。 If efficiency is important, it might be better to replace listaPalabras with a dict. 如果效率很重要,最好用dict代替listaPalabras

def comparapal(lista):
  listaPalabras=[]
  for item in lista:
     if item[2] in eagles_dict.keys():
        listaPalabras.append([item[1],item[2]])

  last_tt = [None, None]
  for tt in sorted(listaPalabras):
    if tt == last_tt:
      print "Observed %s twice" % tt
    elif tt[0] == last_tt[0]:
      print "Observed %s and %s" % (tt, last_tt)
    last_tt = tt

This gives you: Observed ['bien', 'RG'] and ['bien', 'NP00000'] 这将为您提供: Observed ['bien', 'RG'] and ['bien', 'NP00000']

If this does not suit your purposes, please specify your question. 如果这不符合您的目的,请指定您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM