简体   繁体   English

使用递归函数将嵌套列表转换为集合

[英]Convert Nested List to Set using a Recursive Function

I need to create a recursive function (for simplicity) that takes any nested list and returns a set of unique elements.我需要创建一个递归函数(为简单起见),它接受任何嵌套列表并返回一组唯一元素。

To tackle this, I decided to first create a function that takes a list and converts it to a set:为了解决这个问题,我决定首先创建一个接受列表并将其转换为集合的函数:

ranList = [2, 2, 4, 5, 3, 1, 3]

def eue(ranList):
    newList = set(ranList)
    return print(newList)

Easy enough, it works.很简单,它有效。 Now to create a function that takes a nested list and returns a 2-D list (this is a recursive function I found using search on this site) and a function that takes the other function, and returns a set containing unique elements:现在创建一个函数,它接受一个嵌套列表并返回一个二维列表(这是我在本网站上使用搜索找到的一个递归函数)和一个接受另一个函数并返回一个包含唯一元素的集合的函数:

lis = [['c', 'd'], 2, 2, 4, 5, 3, 1, 3, ['c']]

from collections import Iterable

def flatten(lis):
    for item in lis:
        if isinstance(item, Iterable) and not isinstance(item, str):
            for x in flatten(item):
                yield x
        else:        
            yield item

def eue(lis):
    newSet = set(flatten(lis))
    print(newSet)

Now, by calling eue(), it solves my original question.现在,通过调用 eue(),它解决了我原来的问题。 But I want to make it much more simplistic.但我想让它更简单。

How would I go about combining these functions to produce a single function that reduces the amount of computing time it takes to run?我将如何组合这些函数来生成一个函数来减少运行所需的计算时间?

Thanks.谢谢。

you can use itertools.chain to link all the iterators together:您可以使用itertools.chain将所有迭代器链接在一起:

from itertools import chain
from collections import Iterable

def isIter(obj):
    return isinstance(obj, Iterable) and not isinstance(obj,str)
def flatten(it):
    seqs = (flatten(item) if isIter(item) else (item,) for item in it)
    return chain(*seqs)
def enu1(it):
    return set(gen(it))

Although this means that for every non-sequence element you have to make a tuple with that one item (item,) for chain to work properly, not sure how much of an effect it would have on performance.尽管这意味着对于每个非序列元素,您必须使用该项目(item,)制作一个元组,链才能正常工作,但不确定它会对性能产生多大影响。

You can reduce this to a single function (other then isIter ) by converting to a set:您可以通过转换为集合将其减少为单个函数(除isIter ):

def enu2_A(it):
    seqs = (enu2_A(item) if isIter(item) else (item,) for item in it)
    return set(chain(*seqs))

But again this creates a new set object for every recursive call to enu , maybe add an option to convert?但是这又为每次对enu递归调用创建了一个新的set对象,也许添加一个选项来转换?

def enu2(it,return_set = True):
    seqs = (enu2(item,False) if isIter(item) else (item,) for item in it)
    if return_set:
        return set(chain(*seqs))
    else:
        return chain(*seqs)

but combining it into a single function really doesn't give much speed boost at all:但是将它组合成一个单一的函数确实根本没有带来太多的速度提升:

import timeit

a = timeit.timeit("enu1(lis)","from __main__ import enu1,lis",number=10000)
b = timeit.timeit("enu2(lis)","from __main__ import enu2,lis",number=10000)

print(a,b)
print(a/b) #ratio, more then 1 means a took longer

output:输出:

0.3400449920009123 0.32908301999850664
1.0333106582115827

so 3% faster by combining into a single function, I'm guessing that wasn't the speed up you were expecting, your code is very efficient as it is and much more pythonesque then mine so I wouldn't change it.因此,通过组合成一个函数,速度提高了 3%,我猜这不是您所期望的速度,您的代码非常高效,而且比我的更pythonesque,所以我不会更改它。


EDIT: just did a benchmark of my enu2 and your enu - your method is faster then the one I have provided by about 16% , leave it as it is as you can't get much better other then moving to python2 and using compiler.ast.flatten or another C level equivelent:编辑:刚刚对我的enu2和你的enu做了一个基准测试——你的方法比我提供的方法快 16% 左右,保持原样,因为除了转移到 python2 并使用compiler.ast.flatten ,你无法变得更好compiler.ast.flatten或其他 C 级compiler.ast.flatten

from collections import Iterable

def flatten(lis):
    for item in lis:
        if isinstance(item, Iterable) and not isinstance(item, str):
            for x in flatten(item):
                yield x
        else:        
            yield item

def enu(obj):
    return set(flatten(obj))

lis = [['c', 'd'], 2, 2, 4, 5, 3, 1, 3, ['c']]

import timeit

a = timeit.timeit("enu(lis)","from __main__ import enu,lis",number=10000)

b = timeit.timeit("set(ast.flatten(lis))","from __main__ import lis ; from compiler import ast",number=10000)

print(a,b)
print(a/b)

output:输出:

0.3324121500045294 0.28561264199379366
1.1638565705076622

so doing the operation in C sped up the process by 4X, however the compiler package has been out of date since 2.6 according to the docs :因此,在 C 中执行操作将过程加快了 4 倍,但是根据文档compiler包自 2.6 以来已过时:

Deprecated since version 2.6: The compiler package has been removed in Python 3. 2.6 版后已弃用:编译器包已在 Python 3 中删除。

so it is quite possible that if you wrote flatten in a C extension you could get even better results but if you want to write code in pure python you have about as good as you can get.所以很有可能如果你在 C 扩展中编写flatten你可以获得更好的结果,但是如果你想用纯 python 编写代码,你可以获得尽可能好的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM