简体   繁体   English

在 Python 中对嵌套列表进行排序和分组

[英]Sorting and Grouping Nested Lists in Python

I have the following data structure (a list of lists)我有以下数据结构(列表列表)

[
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

I would like to be able to我希望能够

  1. Use a function to reorder the list so that I can group by each item in the list.使用 function 对列表重新排序,以便我可以按列表中的每个项目进行分组。 For example I'd like to be able to group by the second column (so that all the 21's are together)例如,我希望能够按第二列进行分组(以便所有 21 位都在一起)

  2. Use a function to only display certain values from each inner list.使用 function 仅显示每个内部列表中的某些值。 For example i'd like to reduce this list to only contain the 4th field value of '2somename'例如,我想将此列表缩减为仅包含“2somename”的第 4 个字段值

so the list would look like this所以列表看起来像这样

[
     ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
     ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

For the first question, the first thing you should do is sort the list by the second field using itemgetter from the operator module: 对于第一个问题,您应该做的第一件事是使用来自运算符模块的itemgetter按第二个字段对列表进行排序:

x = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

from operator import itemgetter

x.sort(key=itemgetter(1))

Then you can use itertools' groupby function: 然后你可以使用itertools的groupby函数:

from itertools import groupby
y = groupby(x, itemgetter(1))

Now y is an iterator containing tuples of (element, item iterator). 现在y是一个包含元组(元素,项迭代器)的迭代器。 It's more confusing to explain these tuples than it is to show code: 解释这些元组比显示代码更令人困惑:

for elt, items in groupby(x, itemgetter(1)):
    print(elt, items)
    for i in items:
        print(i)

Which prints: 哪个印刷品:

21 <itertools._grouper object at 0x511a0>
['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
22 <itertools._grouper object at 0x51170>
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

For the second part, you should use list comprehensions as mentioned already here: 对于第二部分,您应该使用已在此处提到的列表推导:

from pprint import pprint as pp
pp([y for y in x if y[3] == '2somename'])

Which prints: 哪个印刷品:

[['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]

If you assigned it to var "a"... 如果你把它分配给var“a”......

python 2.x: python 2.x:

#1: #1:

a.sort(lambda x,y: cmp(x[1], y[1]))

#2: #2:

filter(lambda x: x[3]=="2somename", a)

python 3: python 3:

#1: #1:

a.sort(key=lambda x: x[1])

If I understand your question correctly, the following code should do the job: 如果我正确理解您的问题,以下代码应该完成这项工作:

l = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

def compareField(field):
   def c(l1,l2):
      return cmp(l1[field], l2[field])
   return c

# Use compareField(1) as the ordering criterion, i.e. sort only with
# respect to the 2nd field
l.sort(compareField(1))
for row in l: print row

print
# Select only those sublists for which 4th field=='2somename'
l2somename = [row for row in l if row[3]=='2somename']
for row in l2somename: print row

Output: 输出:

['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

Use a function to reorder the list so that I can group by each item in the list. 使用函数对列表重新排序,以便我可以按列表中的每个项目进行分组。 For example I'd like to be able to group by the second column (so that all the 21's are together) 例如,我希望能够按第二列分组(这样所有21个都在一起)

Lists have a built in sort method and you can provide a function that extracts the sort key. 列表具有内置的排序方法,您可以提供一个提取排序键的函数。

>>> import pprint
>>> l.sort(key = lambda ll: ll[1])
>>> pprint.pprint(l)
[['4', '21', '1', '14', '2008-10-24 15:42:58'],
 ['5', '21', '3', '19', '2008-10-24 15:45:45'],
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]

Use a function to only display certain values from each inner list. 使用函数仅显示每个内部列表中的某些值。 For example i'd like to reduce this list to only contain the 4th field value of '2somename' 例如,我想将此列表缩小为仅包含'2somename'的第4个字段值

This looks like a job for list comprehensions 这看起来像是列表推导的工作

>>> [ll[3] for ll in l]
['14', '2somename', '19', '1somename', '2somename']

If you'll be doing a lot of sorting and filtering, you may like some helper functions. 如果您要进行大量的排序和过滤,您可能会喜欢一些辅助函数。

m = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

# Sort and filter helpers.
sort_on   = lambda pos:     lambda x: x[pos]
filter_on = lambda pos,val: lambda l: l[pos] == val

# Sort by second column
m = sorted(m, key=sort_on(1))

# Filter on 4th column, where value = '2somename'
m = filter(filter_on(3,'2somename'),m)

For part (2), with x being your array, I think you want, 对于第(2)部分,x是你的数组,我想你想要,

[y for y in x if y[3] == '2somename']

Which will return a list of just your data lists that have a fourth value being '2somename'... Although it seems Kamil is giving you the best advice with going for SQL... 这将返回一个只有你的数据列表的列表,其中第四个值为'2somename'...虽然看起来卡米尔给你提供了最好的建议,但需要使用SQL ...

It looks a lot like you're trying to use a list as a database. 它看起来很像你试图将列表用作数据库。

Nowadays Python includes sqlite bindings in the core distribution. 如今Python在核心发行版中包含了sqlite绑定。 If you don't need persistence, it's really easy to create an in-memory sqlite database (see How do I create a sqllite3 in-memory database? ). 如果您不需要持久性,那么创建内存中的sqlite数据库非常容易(请参阅如何创建sqllite3内存数据库? )。

Then you can use SQL statements to do all this sorting and filtering without having to reinvent the wheel. 然后,您可以使用SQL语句执行所有这些排序和过滤,而无需重新发明轮子。

You're simply creating indexes on your structure, right? 你只是在你的结构上创建索引,对吧?

>>> from collections import defaultdict
>>> def indexOn( things, pos ):
...     inx= defaultdict(list)
...     for t in things:
...             inx[t[pos]].append(t)
...     return inx
... 
>>> a=[
...  ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
...  ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
...  ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
...  ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
...  ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
... ]

Here's your first request, grouped by position 1. 这是您的第一个请求,按位置1分组。

>>> import pprint
>>> pprint.pprint( dict(indexOn(a,1)) )
{'21': [['4', '21', '1', '14', '2008-10-24 15:42:58'],
        ['5', '21', '3', '19', '2008-10-24 15:45:45'],
        ['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
 '22': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
        ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}

Here's your second request, grouped by position 3. 这是您的第二个请求,按位置3分组。

>>> dict(indexOn(a,3))
{'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], '14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']], '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']]}
>>> pprint.pprint(_)
{'14': [['4', '21', '1', '14', '2008-10-24 15:42:58']],
 '19': [['5', '21', '3', '19', '2008-10-24 15:45:45']],
 '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
 '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
               ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]} 

You can use for loop to sort and group the elements in the nested list.您可以使用 for 循环对嵌套列表中的元素进行排序和分组。 The code will be:代码将是:

l = [['3', '21', '1', '14', '2008-10-24 15:42:58'], 
['4', '22', '4','2somename','2008-10-24 15:22:03'], 
['5', '21', '3', '19', '2008-10-24 15:45:45'], 
['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
['7', '35', '3','2somename', '2008-10-24 15:45:51']]
col = int(input("Enter the column to search(1-5):"))
val = str(input("Enter the element to group by:"))
val1=[]
print('Searching...')
for x in l:
    cmp=x[col-1]
    if cmp==val:
        val1=x
        print(val1)
emp=[]
if val1 == emp:
    print('No search result. Please Try Again!!')

The output would look like this: output 看起来像这样:

Enter the column to search(1-5):4
Enter the element to group by:2somename
Searching...
['4', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '35', '3', '2somename', '2008-10-24 15:45:51']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM