[英]Python: fastest way to extract sublist from a list of objects given an attribute
[英]Fastest way to extract elements from a list that not matched condition in python
我正在尋找在條件下從列表中提取所有元組成員的最快方法 。
示例:從一個元組列表(例如[(0,0,4),(1,0,3),(1,2,1),(4,0,0)])中,我需要提取所有在第一元組位置具有3個以上,然后在第二元組位置具有2個以上,在最后一個元組位置具有1個以上。 在此示例中應提取(4,0,0)(->第一個條件),不提取任何內容(->第二個條件)和(0,0,4),(1,0,3)(->最后一個條件)。 這個例子很小,我需要在成千上萬個元組的列表上執行。
根據我根據您的答案生成的代碼,以下是幾秒鍾的結果:
my_naive1,就像EmilVikström提出的一樣? 13.0360000134
my_naive2 110.727999926
蒂姆·皮茨克(Tim Pietzcker)9.8329999446
唐12.5640001297
import itertools, operator, time, copy
from operator import itemgetter
def combinations_with_replacement_counts(n, r): #(A, N) in our example.N individuals/balls in A genotypes/boxes
size = n + r - 1
for indices in itertools.combinations(range(size), n-1):
#print indices
starts = [0] + [index+1 for index in indices]
stops = indices + (size,)
yield tuple(map(operator.sub, stops, starts))
xp = list(combinations_with_replacement_counts(3,20)) # a very small case
a1=time.time()
temp=[]
for n in xp:
for n1 in xp:
for i in xp:
if i[0] <= min(n1[0],n[0]) or i[1] <= min(n1[1],n[1]) or i[2] <= min(n1[2],n[2]):
temp.append(i)
a2=time.time()
for n in xp:
for n1 in xp:
xp_copy = copy.deepcopy(xp)
for i in xp:
if i[0] > min(n[0],n[0]) or i[1] > min(n[1],n[1]) or i[2] > min(n[2],n[2]):
xp_copy.remove(i)
a3=time.time()
for n in xp:
for n1 in xp:
output = [t for t in xp if t[0]<=min(n[0],n[0]) or t[1]<=min(n[1],n[1]) or t[2]<=min(n[2],n[2])]
a4=time.time()
for n in xp:
for n1 in xp:
l1 = sorted(xp, key=itemgetter(0), reverse=True)
l1_fitered = []
for item in l1:
if item[0] <= min(n[0],n[0]):
break
l1_fitered.append(item)
l2 = sorted(l1_fitered, key=itemgetter(1), reverse=True)
l2_fitered = []
for item in l2:
if item[1] <= min(n[1],n[1]):
break
l2_fitered.append(item)
l3 = sorted(l2_fitered, key=itemgetter(2), reverse=True)
l3_fitered = []
for item in l3:
if item[2] <= min(n[2],n[2]):
break
l3_fitered.append(item)
a5=time.time()
print "soluce my_naive1, like proposed by Emil Vikström?",a2-a1
print "soluce my_naive2",a3-a2
print "soluce Tim Pietzcker",a4-a3
print "soluce Don",a5-a4
>>> l = [(0,0,4), (1,0,3), (1,2,1), (4,0,0)]
>>> output = [t for t in l if t[0]>3 or t[1]>2 or t[2]>1]
>>> output
[(0, 0, 4), (1, 0, 3), (4, 0, 0)]
這是快速的,因為僅在t[0]>3
為False
(與第三個條件相同)時才評估t[1]>2
。 因此,在您的示例列表中,僅需要8個比較。
如果改用生成器表達式,則可以節省時間和內存(取決於您對過濾后的數據執行的操作):
>>> l = [(0,0,4), (1,0,3), (1,2,1), (4,0,0)]
>>> for item in (t for t in l if t[0]>3 or t[1]>2 or t[2]>1):
>>> # do something with that item
有三個列表,每個條件一個,然后用for循環遍歷輸入集,將每個元組排序到正確的目標列表中。 這將以O(n)(線性)時間執行,這是此問題可能最快的漸近運行時間。 它也只會在列表上循環一次。
如果您不關心結果項的順序,建議在排序列表中查找,並在第一個不匹配項上設置中斷條件:這將跳過列表尾部。
from operator import itemgetter
l = [(..., ..., ...), (...)]
l1_source = sorted(l, key=itemgetter(0), reverse=True)
l1_fitered = []
for item in l1:
if item[0] <= 3:
break
l1_fitered .append(item)
l2 = sorted(l, key=itemgetter(1), reverse=True)
...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.