我可以使用 numba 來加速這個循環嗎？

Question

我遇到了 numba，這是一個很棒的庫，可以加速 python 代碼。 我想知道是否有任何方法可以將此代碼轉換為 numpy 代碼以利用 numba。 我的意圖是，對於 OS_name 和客戶端 cookie id 組合的每個元素，找出每列中的差異並記錄在字典中至少顯示一個差異的所有列。

我試着做：

@jit(nopython = True)
def gigi():
from tqdm.notebook import trange, tqdm
df = df.applymap(str)

df2 = df.copy()
del df2['client_cookie_id']


s = []
d = {}

for c in tqdm(range(0, len(df.client_cookie_id.unique().tolist()))):

    cid = df.client_cookie_id.unique().tolist()[c]

    for OS in df.OS_name.unique().tolist():

        ### take the indexes of all the occurrences of a single client_cookie_id

        t = df[(df['client_cookie_id'] == cid) & (df['OS_name'] == OS)].index.tolist()

        if len(t) >= 2:

            A = t[0]

            for i in t[1:]:


                B = i

                list1 = list(df2.loc[A])
                list2 = list(df2.loc[B])

                common = list(dict.fromkeys([l1 for l1 in list1 if l1 in list2]))
                remaining = list(filter(lambda i: i not in common, list1+list2))

                t1 = []

                for i in range(0, len(remaining)):

                    t1.append(remaining[i].split('___')[0])

                used = set()
                unique = [x for x in t1 if x not in used and (used.add(x) or True)]
                unique

                for i in range(0, len(unique)):

                    s.append(unique[i])

            s = [x for x in s if x not in used and (used.add(x) or True)]

        d[cid] = s

    else:

        continue

return d

gigi()

d = gigi()

但我收到以下錯誤

AssertionError: Failed in nopython mode pipeline (step: inline calls to locally defined closures)
key already in dictionary: '$phi28.0'

有人可以幫助我嗎？ 謝謝

Answer 1

這並不能解決您的整個問題，但它確實顯示了一種更快的方式來掃描行。 請注意，我只在這里打印不匹配的內容； 我不收集它們。 不確定您想要什么確切的 output：

import pandas as pd

data = { 
        'client_cookie_id': [ 111, 111, 111, 222, 222, 222 ],
        'OS_name': [ 333, 333, 444, 555, 555, 666 ],
        'data1': [ 21, 22, 23, 24, 25, 26 ],
        'data2': [ 31, 31, 31, 32, 33, 33 ]
    }


def gigi(df):
    df = df.applymap(str)
    df = df.sort_values( by=['client_cookie_id', 'OS_name'] )

    last = None
    for index, row in df.iterrows():
        if last is not None and row['client_cookie_id'] == last['client_cookie_id'] and row['OS_name'] == last['OS_name']:
            # Compare the other columns.
            for name,b,c in zip(row.index, row, last):
                if name not in ('client_cookie_id', 'OS_name') and b != c:
                    print("Difference in", name, 
                        "with", row['client_cookie_id'], '/', 
                        row['OS_name'], ": ", b, c )
        else:
            last = row

df = pd.DataFrame(data)
gigi(df)

Output：

Difference in data1 with 111 / 333 :  22 21
Difference in data1 with 222 / 555 :  25 24
Difference in data2 with 222 / 555 :  33 32

我可以使用 numba 來加速這個循環嗎？

問題描述

1 個解決方案

解決方案1
0 2021-11-22 18:54:11

我可以使用 numba 來加速這個循環嗎？

問題描述

1 個解決方案

解決方案1 0 2021-11-22 18:54:11

解決方案1
0 2021-11-22 18:54:11