大數據集優化

Question

我在這里發布了一個代碼供審查。 然而，到目前為止它沒有收到正確的響應，我認為這是由於代碼的冗長。 在這里，我將切入正題。 假設我們有以下列表：

t0=[('Albania','Angola','Germany','UK'),('UK','France','Italy'),('Austria','Bahamas','Brazil','Chile'),('Germany','UK'),('US')]
t1=[('Angola', 'UK'), ('Germany', 'UK'), ('UK', 'France'), ('UK', 'Italy'), ('France', 'Italy'), ('Austria', 'Bahamas')]
t2=[('Angola:UK'), ('Germany:UK'), ('UK:France'), ('UK:Italy'), ('France:Italy'), ('Austria:Bahamas')]

目標是針對t1每一對，我們通過t0 ，如果找到該對，我們將其替換為相應的t3元素，我們可以使用以下方法執行此操作：

result = []
for v1, v2 in zip(t1, t2):
    out = []
    for i in t0:
        common = set(v1).intersection(i)
        if set(v1) == common:
            out.append(tuple(list(set(i) - common) + [v2]))
        else:
            out.append(tuple(i))
    result.append(out)

pprint(result, width=100)

這使：

[[('Albania', 'Germany', 'Angola:UK'),
  ('UK', 'France', 'Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany:UK'),
  ('UK', 'France', 'Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany:UK',),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('Italy', 'UK:France'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('France', 'UK:Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('UK', 'France:Italy'),
  ('Austria', 'Bahamas', 'Brazil', 'Chile'),
  ('Germany', 'UK'),
  ('U', 'S')],
 [('Albania', 'Angola', 'Germany', 'UK'),
  ('UK', 'France', 'Italy'),
  ('Brazil', 'Chile', 'Austria:Bahamas'),
  ('Germany', 'UK'),
  ('U', 'S')]]

此列表的長度為 6，這表明t1和t2中有 6 個元素，每個子列表有 5 個元素，對應於t0的元素數。 就目前而言，代碼很快，但在我的情況下，我有t0長度為 ~48000 和 t1 的長度為 ~30000。 運行時間幾乎是永遠的我想知道如何用更快的方法執行相同的操作？

Answer 1

您可以使用雙重列表理解。 代碼運行速度大約快 3.47 倍（13.3 µs 與 46.2 µs）。

t0=[('Albania','Angola','Germany','UK'),('UK','France','Italy'),('Austria','Bahamas','Brazil','Chile'),('Germany','UK'),('US')]
t1=[('Angola', 'UK'), ('Germany', 'UK'), ('UK', 'France'), ('UK', 'Italy'), ('France', 'Italy'), ('Austria', 'Bahamas')]
t2=[('Angola:UK'), ('Germany:UK'), ('UK:France'), ('UK:Italy'), ('France:Italy'), ('Austria:Bahamas')]

# We transform the lists of tuple to lists of sets for easier and faster computations
# We transform the lists of tuple to lists of sets for easier and faster computations
t0 = [set(x) for x in t0]
t1 = [set(x) for x in t1]

# We define a function that removes list of elements and adds an element
# from a set 
def add_remove(set_, to_remove, to_add):
    result_temp = set_.copy()
    for element in to_remove:
        result_temp.remove(element)
    result_temp.add(to_add)
    return result_temp

# We do the computation using a double list comprehension
result = [[add_remove(y, x, z) if x.issubset(y) else y for y in t0] 
          for x, z in zip(t1, t2)]

大數據集優化

問題描述

1 個解決方案

解決方案1
1 已采納 2019-07-11 11:56:06

大數據集優化

問題描述

1 個解決方案

解決方案1 1 已采納 2019-07-11 11:56:06

解決方案1
1 已采納 2019-07-11 11:56:06