[英]Convert Nested List to Set using a Recursive Function
I need to create a recursive function (for simplicity) that takes any nested list and returns a set of unique elements.我需要创建一个递归函数(为简单起见),它接受任何嵌套列表并返回一组唯一元素。
To tackle this, I decided to first create a function that takes a list and converts it to a set:为了解决这个问题,我决定首先创建一个接受列表并将其转换为集合的函数:
ranList = [2, 2, 4, 5, 3, 1, 3]
def eue(ranList):
newList = set(ranList)
return print(newList)
Easy enough, it works.很简单,它有效。 Now to create a function that takes a nested list and returns a 2-D list (this is a recursive function I found using search on this site) and a function that takes the other function, and returns a set containing unique elements:
现在创建一个函数,它接受一个嵌套列表并返回一个二维列表(这是我在本网站上使用搜索找到的一个递归函数)和一个接受另一个函数并返回一个包含唯一元素的集合的函数:
lis = [['c', 'd'], 2, 2, 4, 5, 3, 1, 3, ['c']]
from collections import Iterable
def flatten(lis):
for item in lis:
if isinstance(item, Iterable) and not isinstance(item, str):
for x in flatten(item):
yield x
else:
yield item
def eue(lis):
newSet = set(flatten(lis))
print(newSet)
Now, by calling eue(), it solves my original question.现在,通过调用 eue(),它解决了我原来的问题。 But I want to make it much more simplistic.
但我想让它更简单。
How would I go about combining these functions to produce a single function that reduces the amount of computing time it takes to run?我将如何组合这些函数来生成一个函数来减少运行所需的计算时间?
Thanks.谢谢。
you can use itertools.chain
to link all the iterators together:您可以使用
itertools.chain
将所有迭代器链接在一起:
from itertools import chain
from collections import Iterable
def isIter(obj):
return isinstance(obj, Iterable) and not isinstance(obj,str)
def flatten(it):
seqs = (flatten(item) if isIter(item) else (item,) for item in it)
return chain(*seqs)
def enu1(it):
return set(gen(it))
Although this means that for every non-sequence element you have to make a tuple with that one item (item,)
for chain to work properly, not sure how much of an effect it would have on performance.尽管这意味着对于每个非序列元素,您必须使用该项目
(item,)
制作一个元组,链才能正常工作,但不确定它会对性能产生多大影响。
You can reduce this to a single function (other then isIter
) by converting to a set:您可以通过转换为集合将其减少为单个函数(除
isIter
):
def enu2_A(it):
seqs = (enu2_A(item) if isIter(item) else (item,) for item in it)
return set(chain(*seqs))
But again this creates a new set
object for every recursive call to enu
, maybe add an option to convert?但是这又为每次对
enu
递归调用创建了一个新的set
对象,也许添加一个选项来转换?
def enu2(it,return_set = True):
seqs = (enu2(item,False) if isIter(item) else (item,) for item in it)
if return_set:
return set(chain(*seqs))
else:
return chain(*seqs)
but combining it into a single function really doesn't give much speed boost at all:但是将它组合成一个单一的函数确实根本没有带来太多的速度提升:
import timeit
a = timeit.timeit("enu1(lis)","from __main__ import enu1,lis",number=10000)
b = timeit.timeit("enu2(lis)","from __main__ import enu2,lis",number=10000)
print(a,b)
print(a/b) #ratio, more then 1 means a took longer
output:输出:
0.3400449920009123 0.32908301999850664
1.0333106582115827
so 3% faster by combining into a single function, I'm guessing that wasn't the speed up you were expecting, your code is very efficient as it is and much more pythonesque then mine so I wouldn't change it.因此,通过组合成一个函数,速度提高了 3%,我猜这不是您所期望的速度,您的代码非常高效,而且比我的更pythonesque,所以我不会更改它。
EDIT: just did a benchmark of my enu2
and your enu
- your method is faster then the one I have provided by about 16% , leave it as it is as you can't get much better other then moving to python2 and using compiler.ast.flatten
or another C level equivelent:编辑:刚刚对我的
enu2
和你的enu
做了一个基准测试——你的方法比我提供的方法快 16% 左右,保持原样,因为除了转移到 python2 并使用compiler.ast.flatten
,你无法变得更好compiler.ast.flatten
或其他 C 级compiler.ast.flatten
:
from collections import Iterable
def flatten(lis):
for item in lis:
if isinstance(item, Iterable) and not isinstance(item, str):
for x in flatten(item):
yield x
else:
yield item
def enu(obj):
return set(flatten(obj))
lis = [['c', 'd'], 2, 2, 4, 5, 3, 1, 3, ['c']]
import timeit
a = timeit.timeit("enu(lis)","from __main__ import enu,lis",number=10000)
b = timeit.timeit("set(ast.flatten(lis))","from __main__ import lis ; from compiler import ast",number=10000)
print(a,b)
print(a/b)
output:输出:
0.3324121500045294 0.28561264199379366
1.1638565705076622
so doing the operation in C sped up the process by 4X, however the compiler
package has been out of date since 2.6 according to the docs :因此,在 C 中执行操作将过程加快了 4 倍,但是根据文档,
compiler
包自 2.6 以来已过时:
Deprecated since version 2.6: The compiler package has been removed in Python 3.
2.6 版后已弃用:编译器包已在 Python 3 中删除。
so it is quite possible that if you wrote flatten
in a C extension you could get even better results but if you want to write code in pure python you have about as good as you can get.所以很有可能如果你在 C 扩展中编写
flatten
你可以获得更好的结果,但是如果你想用纯 python 编写代码,你可以获得尽可能好的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.