简体   繁体   English

两组的交集和差集

[英]Intersection and difference of two sets

Given two sets a and b that both contain integers, I would like to create another set c that contains all integers that are in a and b and additionally each integer that is in a xor b with probability 1/2, eg:给定两个集合ab都包含整数,我想创建另一个集合c ,其中包含ab中的所有整数,另外每个 integer 都在a xor b中,概率为 1/2,例如:

a={1,2,3,4}, b={1,2,5}
The result of function(a,b) could be c={1,2,5} or c={1,2,3,4,5} or c={1,2,3,5} or c={1,2,3,4} ....

This is a bottleneck in my code and is done iteratively many times.这是我的代码中的一个瓶颈,并且需要多次迭代。 Currently my code is:目前我的代码是:

def function(a, b):
    c = a & b
    c_temp = list(a ^ b)

    for x in range(len(c_temp)):
        if random.random() < 0.5:
            c.add(c_temp[x])
    return c

Could this be done faster?这可以更快地完成吗? Thanks!谢谢!

I believe so!我相信是这样!

Try the code below, which takes the loop out and let's the random module select from the xor set, which will be faster.试试下面的代码,它取出循环,让我们从 xor 集中随机模块 select,这会更快。 I used the binomial distribution to determine how many should be selected, which is the correct way to do this with each element being considered with p=0.5我使用二项分布来确定应该选择多少个,这是正确的方法,每个元素都被考虑为 p=0.5

#random selection

import numpy as np
import random


def f2(a, b):
    c = a & b
    xor_stuff = a^b
    xor_selected = random.sample(xor_stuff, np.random.binomial(len(xor_stuff), p=0.5))
    c.update(xor_selected)
    return c

a = {1, 2, 3, 4, 5, 6}
b =          {4, 5, 6, 7, 8, 9}

for trial in range(5):
    print(f2(a,b))

Yields:产量:

{3, 4, 5, 6}
{1, 4, 5, 6, 7}
{2, 4, 5, 6, 7, 8, 9}
{1, 2, 4, 5, 6, 9}
{1, 2, 4, 5, 6}
[Finished in 0.2s]

---- Some speed testing of solutions. ---- 一些解决方案的速度测试。 ---- ----

4 variants: 4 种变体:

# original
def f1(a, b):
    c = a & b
    c_temp = list(a ^ b)

    for x in range(len(c_temp)):
        if random.random() < 0.5:
            c.add(c_temp[x])
    return c


def f2(a, b):
    c = a & b
    xor_stuff = a^b
    xor_selected = random.sample(xor_stuff, np.random.binomial(len(xor_stuff), p=0.5))
    c.update(xor_selected)
    return c

def f3(a, b):
    c = a & b
    st = list(a ^ b)
    c.update(np.array(st)[np.random.random(len(st)) > 0.5])
    return c

def f4(a, b):
    c = a & b

    for x in a ^ b:
        if random.random() < 0.5:
            c.add(x)
    return c

test_size = 1000
a2 = {random.randint(0, 10_000_000) for t in range(test_size)}
b2 = {random.randint(0, 10_000_000) for t in range(test_size)}

Results...结果...

(Sadly, mine is slowest. surprised..: :( ) (可悲的是,我的速度最慢。惊讶..::()

In [25]: %timeit f1(a2, b2)                                                     
391 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [26]: %timeit f2(a2, b2)                                                     
644 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [27]: %timeit f3(a2, b2)                                                     
365 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [28]: %timeit f4(a2, b2)                                                     
342 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The list is unnecessary, and range-len iteration is slower than direct iteration.该列表是不必要的,并且 range-len 迭代比直接迭代慢。 You can iterate over a ^ b directly:您可以直接迭代a ^ b

def function(a, b):
    c = a & b

    for x in a ^ b:
        if random.random() < 0.5:
            c.add(x)
    return c

I think making a uniform continuous random variable for a binary choice is a bit wasteful.我认为为二元选择制作一个统一的连续随机变量有点浪费。 So here is a suggestion using random.getrandbits :所以这里有一个使用random.getrandbits的建议:

import random
import itertools

def pp(a,b):
    out = a&b
    ab = a^b
    if ab:
        bitfield = map("1".__eq__,reversed(bin(random.getrandbits(len(ab)))))
        out.update(itertools.compress(ab,bitfield))
    return out

Alternatively, and perhaps clearer:或者,也许更清楚:

        bitfield = map("1".__eq__,f"{random.getrandbits(len(ab)):0{len(ab)}b}")

... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM