简体   繁体   中英

How to create a new layer of sublists based on a common key within each sublist in order to categorize the sublists?

How to create a new layer of sublists based on a common key within each sublist in order to categorize the sublists? In other words, how do you place sublists into a new sublist within the list where each item at index 1 is the same?

For example, I'd like to turn the following list of sublists into a list of sublists in which each sublist is in a new sublist where each item at index 1 is the same within that sublist. I'd like to place the sublists of apples, bananas and oranges in this list into a new sublist.

lsta = [['2014W01','apple',21,'apple@gmail.com'],['2014W02','apple',19,'apple@g.com'],['2014W02','banana',51,'b@gmail.com'],['2014W03','apple',100,'apple@gmail.com'],['2014W01','banana',71,'b@yahoo.com'],['2014W02','organge',21,'organge@gmail.com']]

I'd like the three sublists of apples to be contained within a new sublist, as well as the two sublists of bananas into a new sublist, etc.

Desired_List = [[['2014W01','apple',21,'apple@gmail.com'],['2014W02','apple',19,'apple@g.com'],['2014W03','apple',100,'apple@gmail.com']],[['2014W02','banana',51,'b@gmail.com'],['2014W01','banana',71,'b@yahoo.com']],[['2014W02','organge',21,'organge@gmail.com']]]

Bonus points, if you could tell me how to do multiple categorizations (eg not only separating by fruit type, but also by week)?

In [43]: import itertools as IT

In [44]: import operator

In [46]: [list(grp) for key, grp in IT.groupby(sorted(lsta, key=operator.itemgetter(1)), key=operator.itemgetter(1))]
Out[46]: 
[[['2014W01', 'apple', 21, 'apple@gmail.com'],
  ['2014W02', 'apple', 19, 'apple@g.com'],
  ['2014W03', 'apple', 100, 'apple@gmail.com']],
 [['2014W02', 'banana', 51, 'b@gmail.com'],
  ['2014W01', 'banana', 71, 'b@yahoo.com']],
 [['2014W02', 'organge', 21, 'organge@gmail.com']]]

Normally, I'd use itertools.groupby on this, but just for fun, here's a method that does all the heavy lifting manually

def transform(lista):
    d = {}
    for subl in lista:
        k = subl.pop(1)
        if k not in d:
            d[k] = []
        d[k].append(subl)
    answer = []
    for k, lists in d.items():
        temp = []
        for l in lists:
            l.insert(1, k)
            temp.append(l)
        answer.append(temp)
    return answer

Output:

In [56]: transform(lsta)
Out[56]: 
[[['2014W02', 'organge', 21, 'organge@gmail.com']],
 [['2014W01', 'apple', 21, 'apple@gmail.com'],
  ['2014W02', 'apple', 19, 'apple@g.com'],
  ['2014W03', 'apple', 100, 'apple@gmail.com']],
 [['2014W02', 'banana', 51, 'b@gmail.com'],
  ['2014W01', 'banana', 71, 'b@yahoo.com']]]

I'll take a bit of a different tack. You probably want your group-by field to be the lookup value in a dict . The value can just be a list of various.. whatever you want to call each sublist here. I'll call each one a FruitPerson .

from collections import defaultdict, namedtuple

FruitPerson = namedtuple('FruitPerson','id age email')

d = defaultdict(list)

for sublist in lsta:
    d[sublist[1]].append(FruitPerson(sublist[0],*sublist[2:]))

Then, for example:

d['apple']
Out[19]: 
[FruitPerson(id='2014W01', age=21, email='apple@gmail.com'),
 FruitPerson(id='2014W02', age=19, email='apple@g.com'),
 FruitPerson(id='2014W03', age=100, email='apple@gmail.com')]

d['apple'][0]
Out[20]: FruitPerson(id='2014W01', age=21, email='apple@gmail.com')

d['apple'][0].id
Out[21]: '2014W01'

Edit: okay, multiple-categorization-bonus-point question. You just need to nest your dictionaries. The syntax gets a little goofy because the argument to defaultdict has to be a callable; you can do this with either lambda or functools.partial :

FruitPerson = namedtuple('FruitPerson','age email') #just removed 'id' field
d = defaultdict(lambda: defaultdict(list))

for sublist in lsta:
    d[sublist[1]][sublist[0]].append(FruitPerson(*sublist[2:]))

d['apple']
Out[37]: defaultdict(<type 'list'>, {'2014W03': [FruitPerson(age=100, email='apple@gmail.com')], '2014W02': [FruitPerson(age=19, email='apple@g.com')], '2014W01': [FruitPerson(age=21, email='apple@gmail.com')]})

d['apple']['2014W01']
Out[38]: [FruitPerson(age=21, email='apple@gmail.com')]

d['apple']['2014W01'][0].email
Out[40]: 'apple@gmail.com'

Though honestly at this point you should consider moving up to a real relational database that can understand SELECT whatever FROM whatever WHERE something type queries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM