简体   繁体   中英

Counting elements in a list of lists in python

Hope u can help me w/ this python function:

def comparapal(lista):#lista is a list of lists where each list has 4 elements
  listaPalabras=[]
  for item in lista:
     if item[2] in eagles_dict.keys():# filter the list if the 3rd element corresponds to the key in the dictionary
        listaPalabras.append([item[1],item[2]]) #create a new list with elements 2 and 3

The listaPalabras result:

[
   ['bien', 'NP00000'],
   ['gracia', 'NCFP000'],
   ['estar', 'VAIP1S0'],
   ['bien', 'RG'],
   ['huevo', 'NCMS000'],
   ['calcio', 'NCMS000'],
   ['leche', 'NCFS000'],
   ['proteina', 'NCFS000'],
   ['francisco', 'NP00000'],
   ['ya', 'RG'],
   ['ser', 'VSIS3S0'],
   ['cosa', 'NCFS000']
]

My question is: How can I compare the 1st element of each list so that if the word is the same, compare their tags which is the 2nd element.

Sorry for being ambiguous, the fuunction has to return a list of lists w/ 3 elements: the word, the tag and the number of occurrences of each word. But in order to count the words I need to compare the word w/ others and if there exists 2 or more words alike, then compare the tags to chk the difference. If the tags are different then count the words separately.

result -> [['bien', 'NP00000',1],['bien', 'RG',1]] -> two same words but counted separately by the comparison of the tags Thanks in advance:

import collections
inlist = [
   ['bien', 'NP00000'],
   ['gracia', 'NCFP000'],
   ['estar', 'VAIP1S0'],
   ['bien', 'RG'],
   ['huevo', 'NCMS000'],
   ['calcio', 'NCMS000'],
   ['leche', 'NCFS000'],
   ['proteina', 'NCFS000'],
   ['francisco', 'NP00000'],
   ['ya', 'RG'],
   ['ser', 'VSIS3S0'],
   ['cosa', 'NCFS000']
]
[(a,b,v) for (a,b),v in collections.Counter(map(tuple,inlist)).iteritems()]
#=>[('proteina', 'NCFS000', 1), ('francisco', 'NP00000', 1), ('ser', 'VSIS3S0', 1), ('bien', 'NP00000', 1), ('calcio', 'NCMS000', 1), ('estar', 'VAIP1S0', 1), ('huevo', 'NCMS000', 1), ('gracia', 'NCFP000', 1), ('bien', 'RG', 1), ('cosa', 'NCFS000', 1), ('ya', 'RG', 1), ('leche', 'NCFS000', 1)]

You want to count the number of occurrences of each pair. The counter expression does that. The list comprehension formats this as triples.

What specific output do you need? I don't know what exactly do you need to do, but if you want to group items related to same word, you can turn this structure into dictionary and manipulate it later

>>> new = {}
>>> for i,j in a: # <-- a = listaPalabras 
        if new.get(i) == None:
                new[i] = [j]
        else:
                new[i].append(j)

which will give us:

{'francisco': ['NP00000'], 'ser': ['VSIS3S0'], 'cosa': ['NCFS000'], 'ya': ['RG'], 'bien': ['NP00000', 'RG'], 'estar': ['VAIP1S0'], 'calcio': ['NCMS000'], 'leche': ['NCFS000'], 'huevo': ['NCMS000'], 'gracia': ['NCFP000'], 'proteina': ['NCFS000']}

and then later on you can do:

>>> for i in new:
        if len(new[i]) > 1:
                print "compare {this} and {that}".format(this=new[i][0],that=new[i][1])

will print:

compare NP00000 and RG #for key bien

EDIT: In the first step, you can also use defaultdict, as suggested by Marcin in the comment, this would look like this:

>>> d = defaultdict(list)
>>> for i,j in a:
        d.setdefault(i,[]).append(j)

EDIT2 (answer to OP's comment)

for i in d:
    item = []
    item.append(i)
    item.extend(d[i])
    item.append(len(d[i]))
    result.append(item)

This gives us:

[['francisco', 'NP00000', 1], ['ser', 'VSIS3S0', 1], ['cosa', 'NCFS000', 1], ['ya', 'RG', 1], ['bien', 'NP00000', 'RG', 2], ['estar', 'VAIP1S0', 1], ['calcio', 'NCMS000', 1], ['leche', 'NCFS000', 1], ['huevo', 'NCMS000', 1], ['gracia', 'NCFP000', 1], ['proteina', 'NCFS000', 1]]

A purely list-based solution is possible of course, but requires additional looping. If efficiency is important, it might be better to replace listaPalabras with a dict.

def comparapal(lista):
  listaPalabras=[]
  for item in lista:
     if item[2] in eagles_dict.keys():
        listaPalabras.append([item[1],item[2]])

  last_tt = [None, None]
  for tt in sorted(listaPalabras):
    if tt == last_tt:
      print "Observed %s twice" % tt
    elif tt[0] == last_tt[0]:
      print "Observed %s and %s" % (tt, last_tt)
    last_tt = tt

This gives you: Observed ['bien', 'RG'] and ['bien', 'NP00000']

If this does not suit your purposes, please specify your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM