[英]Can I use numba to speed up this for cycle?
我遇到了 numba,这是一个很棒的库,可以加速 python 代码。 我想知道是否有任何方法可以将此代码转换为 numpy 代码以利用 numba。 我的意图是,对于 OS_name 和客户端 cookie id 组合的每个元素,找出每列中的差异并记录在字典中至少显示一个差异的所有列。
我试着做:
@jit(nopython = True)
def gigi():
from tqdm.notebook import trange, tqdm
df = df.applymap(str)
df2 = df.copy()
del df2['client_cookie_id']
s = []
d = {}
for c in tqdm(range(0, len(df.client_cookie_id.unique().tolist()))):
cid = df.client_cookie_id.unique().tolist()[c]
for OS in df.OS_name.unique().tolist():
### take the indexes of all the occurrences of a single client_cookie_id
t = df[(df['client_cookie_id'] == cid) & (df['OS_name'] == OS)].index.tolist()
if len(t) >= 2:
A = t[0]
for i in t[1:]:
B = i
list1 = list(df2.loc[A])
list2 = list(df2.loc[B])
common = list(dict.fromkeys([l1 for l1 in list1 if l1 in list2]))
remaining = list(filter(lambda i: i not in common, list1+list2))
t1 = []
for i in range(0, len(remaining)):
t1.append(remaining[i].split('___')[0])
used = set()
unique = [x for x in t1 if x not in used and (used.add(x) or True)]
unique
for i in range(0, len(unique)):
s.append(unique[i])
s = [x for x in s if x not in used and (used.add(x) or True)]
d[cid] = s
else:
continue
return d
gigi()
d = gigi()
但我收到以下错误
AssertionError: Failed in nopython mode pipeline (step: inline calls to locally defined closures)
key already in dictionary: '$phi28.0'
有人可以帮助我吗? 谢谢
这并不能解决您的整个问题,但它确实显示了一种更快的方式来扫描行。 请注意,我只在这里打印不匹配的内容; 我不收集它们。 不确定您想要什么确切的 output:
import pandas as pd
data = {
'client_cookie_id': [ 111, 111, 111, 222, 222, 222 ],
'OS_name': [ 333, 333, 444, 555, 555, 666 ],
'data1': [ 21, 22, 23, 24, 25, 26 ],
'data2': [ 31, 31, 31, 32, 33, 33 ]
}
def gigi(df):
df = df.applymap(str)
df = df.sort_values( by=['client_cookie_id', 'OS_name'] )
last = None
for index, row in df.iterrows():
if last is not None and row['client_cookie_id'] == last['client_cookie_id'] and row['OS_name'] == last['OS_name']:
# Compare the other columns.
for name,b,c in zip(row.index, row, last):
if name not in ('client_cookie_id', 'OS_name') and b != c:
print("Difference in", name,
"with", row['client_cookie_id'], '/',
row['OS_name'], ": ", b, c )
else:
last = row
df = pd.DataFrame(data)
gigi(df)
Output:
Difference in data1 with 111 / 333 : 22 21
Difference in data1 with 222 / 555 : 25 24
Difference in data2 with 222 / 555 : 33 32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.