[英]Python 2.7: Remove subdomains from list
我的清單上有大約1,300,000個項目。 例如,['。a','。b.a','。c.b','。fcb']。
我想刪除子域(例如,上面列表中的“ .ba”和“ .fcb”)。
我是新手。 我正在嘗試了解速度。 以下是我的嘗試,似乎很慢。 有什么建議么:
# create separate lists, perhaps that is faster
a1 = []
b2 = []
c3 = []
d4 = []
e5 = []
f6 = []
for i in dupesgone:
j = i.count('.')
if j == 1:
a1.append(i)
elif j == 2:
b2.append(i)
elif j == 3:
c3.append(i)
elif j == 4:
d4.append(i)
elif j == 5:
e5.append(i)
else:
f6.append(i)
for a in a1:
la = -len(a)
for b in b2:
if a == b[la:]:
b2.remove(b)
for c in c3:
if a == c[la:]:
c3.remove(c)
for d in d4:
if a == d[la:]:
d4.remove(d)
--snip--
# how about this, is this faster
[b2.remove(b) for b in b2 for a in a1 if a == b[-len(a):]]
[c3.remove(c) for c in c3 for a in a1 if a == c[-len(a):]]
[d4.remove(d) for d in d4 for a in a1 if a == d[-len(a):]]
[e5.remove(e) for e in e5 for a in a1 if a == e[-len(a):]]
[f6.remove(f) for f in f6 for a in a1 if a == f[-len(a):]]
我應該創建字典嗎? 這樣會更快嗎?
謝謝你的幫助。
通常,僅創建一個新列表要比刪除不匹配的項目更快:
dupesgone = [domain for domain in dupesgone if domain.count(".") == 1]
實際上,我認為最快的算法是
這是未經測試的代碼草圖:
def reverse(s):
return s[::-1]
r = map(reverse, devgone)
r.sort()
ci = None
out = []
for ni in r:
if not ci or not ni.startswith(ci):
out.append(ni)
ci = ni
return map(reverse, out)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.