[英]Intersection and difference of two sets
Given two sets a
and b
that both contain integers, I would like to create another set c
that contains all integers that are in a
and b
and additionally each integer that is in a
xor b
with probability 1/2, eg:给定两个集合
a
和b
都包含整数,我想创建另一个集合c
,其中包含a
和b
中的所有整数,另外每个 integer 都在a
xor b
中,概率为 1/2,例如:
a={1,2,3,4}, b={1,2,5}
The result of function(a,b) could be c={1,2,5} or c={1,2,3,4,5} or c={1,2,3,5} or c={1,2,3,4} ....
This is a bottleneck in my code and is done iteratively many times.这是我的代码中的一个瓶颈,并且需要多次迭代。 Currently my code is:
目前我的代码是:
def function(a, b):
c = a & b
c_temp = list(a ^ b)
for x in range(len(c_temp)):
if random.random() < 0.5:
c.add(c_temp[x])
return c
Could this be done faster?这可以更快地完成吗? Thanks!
谢谢!
I believe so!我相信是这样!
Try the code below, which takes the loop out and let's the random module select from the xor set, which will be faster.试试下面的代码,它取出循环,让我们从 xor 集中随机模块 select,这会更快。 I used the binomial distribution to determine how many should be selected, which is the correct way to do this with each element being considered with p=0.5
我使用二项分布来确定应该选择多少个,这是正确的方法,每个元素都被考虑为 p=0.5
#random selection
import numpy as np
import random
def f2(a, b):
c = a & b
xor_stuff = a^b
xor_selected = random.sample(xor_stuff, np.random.binomial(len(xor_stuff), p=0.5))
c.update(xor_selected)
return c
a = {1, 2, 3, 4, 5, 6}
b = {4, 5, 6, 7, 8, 9}
for trial in range(5):
print(f2(a,b))
{3, 4, 5, 6}
{1, 4, 5, 6, 7}
{2, 4, 5, 6, 7, 8, 9}
{1, 2, 4, 5, 6, 9}
{1, 2, 4, 5, 6}
[Finished in 0.2s]
# original
def f1(a, b):
c = a & b
c_temp = list(a ^ b)
for x in range(len(c_temp)):
if random.random() < 0.5:
c.add(c_temp[x])
return c
def f2(a, b):
c = a & b
xor_stuff = a^b
xor_selected = random.sample(xor_stuff, np.random.binomial(len(xor_stuff), p=0.5))
c.update(xor_selected)
return c
def f3(a, b):
c = a & b
st = list(a ^ b)
c.update(np.array(st)[np.random.random(len(st)) > 0.5])
return c
def f4(a, b):
c = a & b
for x in a ^ b:
if random.random() < 0.5:
c.add(x)
return c
test_size = 1000
a2 = {random.randint(0, 10_000_000) for t in range(test_size)}
b2 = {random.randint(0, 10_000_000) for t in range(test_size)}
(Sadly, mine is slowest. surprised..: :( ) (可悲的是,我的速度最慢。惊讶..::()
In [25]: %timeit f1(a2, b2)
391 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [26]: %timeit f2(a2, b2)
644 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [27]: %timeit f3(a2, b2)
365 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [28]: %timeit f4(a2, b2)
342 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The list is unnecessary, and range-len iteration is slower than direct iteration.该列表是不必要的,并且 range-len 迭代比直接迭代慢。 You can iterate over
a ^ b
directly:您可以直接迭代
a ^ b
:
def function(a, b):
c = a & b
for x in a ^ b:
if random.random() < 0.5:
c.add(x)
return c
I think making a uniform continuous random variable for a binary choice is a bit wasteful.我认为为二元选择制作一个统一的连续随机变量有点浪费。 So here is a suggestion using
random.getrandbits
:所以这里有一个使用
random.getrandbits
的建议:
import random
import itertools
def pp(a,b):
out = a&b
ab = a^b
if ab:
bitfield = map("1".__eq__,reversed(bin(random.getrandbits(len(ab)))))
out.update(itertools.compress(ab,bitfield))
return out
Alternatively, and perhaps clearer:或者,也许更清楚:
bitfield = map("1".__eq__,f"{random.getrandbits(len(ab)):0{len(ab)}b}")
... ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.