简体   繁体   English

Python 按位移位真的很慢吗?

[英]Is Python bitwise shift really slow?

I must be overlooking something, but really don't see why the Python code is so slow...我一定是忽略了一些东西,但真的不明白为什么 Python 代码这么慢......

Counting unique elements in an array where elements are in the range [−1,000,000..1,000,000] and use a bitvector to do this.计算数组中的唯一元素,其中元素在 [−1,000,000..1,000,000] 范围内,并使用位向量来执行此操作。 The Java code, which uses BitSet is about 50 times faster than Python, which takes 9 seconds.使用BitSet的 Java 代码比 Python 快约 50 倍,只需 9 秒。

Is this maybe because when I initialise bitvector = 0 Python doesn't reserve enough memory and the bitvector needs to be copied as it grows?这是否可能是因为当我初始化bitvector = 0 Python 没有保留足够的内存并且 bitvector 需要随着它的增长而被复制?

Python: Python:

def solution(array):
    bitvector = 0
    count = 0
    for element in array:
        # transform -1,000,000 to 0 etc
        element_transformed = element + 1000000
        if bitvector >> element_transformed & 1 == 0:
            count += 1
            bitvector = bitvector | 1 << element_transformed

    return count

Test:测试:

import unittest
import random

from .file1 import solution

class MySolutionTests(unittest.TestCase):
    def test_solution_random_all_unique(self):
        a = random.sample(range(-1000000, 1000001), 100000)
        self.assertEqual(100000, solution(a))

In Java:在 Java 中:

package mypackage;

import java.util.ArrayList;
import java.util.BitSet;
import java.util.List;


public class MyClass {

    public static int solution(List<Integer> array) {
        BitSet bitvector = new BitSet();
        int count = 0;

        for(int i = 0; i < array.size(); i++) {
            int elementTransformed = array.get(i) + 1000000;
            if(bitvector.get(elementTransformed) != true) {
                count++;
                bitvector.set(elementTransformed, true);
            }
        }
        return count;
    }

    public static void main(String[] args) {
        // TODO code application logic here
    }
}

Test:测试:

package mypackage;

import java.util.ArrayList;
import java.util.Collections;
import org.junit.Test;
import static org.junit.Assert.*;

public class MyClassTest {

    public MyClassTest() {
    }

    @Test
    public void testSolutionLong_RandomAllUnique() {
        ArrayList array = new ArrayList();
        for(int i = -1000000; i < 1000000; i++) {
            array.add(i);
        }
        Collections.shuffle(array);
        assertEquals(100000, MyClass.solution(array.subList(0, 100000)));

    }  
}

Just trying to reply directly to the question you posed. 只是尝试直接回答您提出的问题。 It is not a simple question to answer why Python takes 9 seconds and Java is 50 times faster. 回答Python为什么要花9秒而Java要快50倍,这不是一个简单的问题。 Here you can get a good insight of a precedent discussion Is Python slower than Java/C#? 在这里,您可以很好地了解先例讨论的内容Python是否比Java / C#慢?

The way I like to look at it, is that Java is a Object Oriented language, while python is Object Based. 我喜欢看它的方式是Java是一种面向对象的语言,而python是基于对象的。

When looking at a bitwise operation, Java uses the primitive data types that are arguably faster due to not having boxing-unboxing operations and wrappers as a layer of abstraction. 当查看按位运算时,Java使用的原始数据类型可以说更快,这是因为没有装箱/拆箱操作和包装器作为抽象层。 So looking at your code at every iteration python re-wrappes the integer as an object of type integer, while Java does not. 因此,在每次迭代时查看您的代码,python都会将整数重新包装为整数类型的对象,而Java不会。

But again, I wouldn't take for granted that Java is always faster than Python. 但是再次,我不会认为Java总是比Python快。 It is up to which library you are using and which problem you are trying to solve! 您要使用哪个库以及要解决的问题!

The pythonic way to do this is这样做的pythonic方法是

def solution(array):
    return len(set(array))

It is much faster, though will probably use more memory.它要快得多,但可能会使用更多的内存。

The set solution ran in about 100 ms for 10**6 samples from a 2*10**6 range.对于2*10**6范围内的10**6样本, set解决方案在大约 100 毫秒内运行。 I didn't even time the bit array because it took seconds.我什至没有对位数组计时,因为它花了几秒钟。

When talking about lists on the order of 10**6 , it is worth the trade off.在谈论10**6顺序的列表时,值得权衡。 Using sys.getsizeof , I measured the intermediate set as using 4.2 times the memory of the list .使用sys.getsizeof ,我测量了中间set使用了list内存的 4.2 倍。 The equivalent int bit array has about 1/30 the memory of the list .等效的int位数组大约是list内存的 1/30。 This is on a 64 bit Linux system.这是在 64 位 Linux 系统上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM