简体   繁体   English

我可以使用 numba 来加速这个循环吗?

[英]Can I use numba to speed up this for cycle?

I came across numba, which is a fantastic library to speed up python code.我遇到了 numba,这是一个很棒的库,可以加速 python 代码。 I was wondering if is there any way to convert this code into numpy code to leverage on numba.我想知道是否有任何方法可以将此代码转换为 numpy 代码以利用 numba。 My intention is, for each element of combination of OS_name and client cookie id, to find out what are the differences in each columns and record all the columns in which was shown at least one difference in a dictionary.我的意图是,对于 OS_name 和客户端 cookie id 组合的每个元素,找出每列中的差异并记录在字典中至少显示一个差异的所有列。

I tried doing:我试着做:

@jit(nopython = True)
def gigi():
from tqdm.notebook import trange, tqdm
df = df.applymap(str)

df2 = df.copy()
del df2['client_cookie_id']


s = []
d = {}

for c in tqdm(range(0, len(df.client_cookie_id.unique().tolist()))):

    cid = df.client_cookie_id.unique().tolist()[c]

    for OS in df.OS_name.unique().tolist():

        ### take the indexes of all the occurrences of a single client_cookie_id

        t = df[(df['client_cookie_id'] == cid) & (df['OS_name'] == OS)].index.tolist()

        if len(t) >= 2:

            A = t[0]

            for i in t[1:]:


                B = i

                list1 = list(df2.loc[A])
                list2 = list(df2.loc[B])

                common = list(dict.fromkeys([l1 for l1 in list1 if l1 in list2]))
                remaining = list(filter(lambda i: i not in common, list1+list2))

                t1 = []

                for i in range(0, len(remaining)):

                    t1.append(remaining[i].split('___')[0])

                used = set()
                unique = [x for x in t1 if x not in used and (used.add(x) or True)]
                unique

                for i in range(0, len(unique)):

                    s.append(unique[i])

            s = [x for x in s if x not in used and (used.add(x) or True)]

        d[cid] = s

    else:

        continue

return d

gigi()

d = gigi()

But I receive the following error但我收到以下错误

AssertionError: Failed in nopython mode pipeline (step: inline calls to locally defined closures)
key already in dictionary: '$phi28.0'

Is someone able to help me?有人可以帮助我吗? Thanks谢谢

This doesn't solve your whole problem, but it does show a much quicker way to scan through the rows.这并不能解决您的整个问题,但它确实显示了一种更快的方式来扫描行。 Note that I'm only printing the mismatches here;请注意,我只在这里打印不匹配的内容; I'm not gathering them.我不收集它们。 Not sure what you wanted for an exact output:不确定您想要什么确切的 output:

import pandas as pd

data = { 
        'client_cookie_id': [ 111, 111, 111, 222, 222, 222 ],
        'OS_name': [ 333, 333, 444, 555, 555, 666 ],
        'data1': [ 21, 22, 23, 24, 25, 26 ],
        'data2': [ 31, 31, 31, 32, 33, 33 ]
    }


def gigi(df):
    df = df.applymap(str)
    df = df.sort_values( by=['client_cookie_id', 'OS_name'] )

    last = None
    for index, row in df.iterrows():
        if last is not None and row['client_cookie_id'] == last['client_cookie_id'] and row['OS_name'] == last['OS_name']:
            # Compare the other columns.
            for name,b,c in zip(row.index, row, last):
                if name not in ('client_cookie_id', 'OS_name') and b != c:
                    print("Difference in", name, 
                        "with", row['client_cookie_id'], '/', 
                        row['OS_name'], ": ", b, c )
        else:
            last = row

df = pd.DataFrame(data)
gigi(df)

Output: Output:

Difference in data1 with 111 / 333 :  22 21
Difference in data1 with 222 / 555 :  25 24
Difference in data2 with 222 / 555 :  33 32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM