简体   繁体   中英

Is Python bitwise shift really slow?

I must be overlooking something, but really don't see why the Python code is so slow...

Counting unique elements in an array where elements are in the range [−1,000,000..1,000,000] and use a bitvector to do this. The Java code, which uses BitSet is about 50 times faster than Python, which takes 9 seconds.

Is this maybe because when I initialise bitvector = 0 Python doesn't reserve enough memory and the bitvector needs to be copied as it grows?

Python:

def solution(array):
    bitvector = 0
    count = 0
    for element in array:
        # transform -1,000,000 to 0 etc
        element_transformed = element + 1000000
        if bitvector >> element_transformed & 1 == 0:
            count += 1
            bitvector = bitvector | 1 << element_transformed

    return count

Test:

import unittest
import random

from .file1 import solution

class MySolutionTests(unittest.TestCase):
    def test_solution_random_all_unique(self):
        a = random.sample(range(-1000000, 1000001), 100000)
        self.assertEqual(100000, solution(a))

In Java:

package mypackage;

import java.util.ArrayList;
import java.util.BitSet;
import java.util.List;


public class MyClass {

    public static int solution(List<Integer> array) {
        BitSet bitvector = new BitSet();
        int count = 0;

        for(int i = 0; i < array.size(); i++) {
            int elementTransformed = array.get(i) + 1000000;
            if(bitvector.get(elementTransformed) != true) {
                count++;
                bitvector.set(elementTransformed, true);
            }
        }
        return count;
    }

    public static void main(String[] args) {
        // TODO code application logic here
    }
}

Test:

package mypackage;

import java.util.ArrayList;
import java.util.Collections;
import org.junit.Test;
import static org.junit.Assert.*;

public class MyClassTest {

    public MyClassTest() {
    }

    @Test
    public void testSolutionLong_RandomAllUnique() {
        ArrayList array = new ArrayList();
        for(int i = -1000000; i < 1000000; i++) {
            array.add(i);
        }
        Collections.shuffle(array);
        assertEquals(100000, MyClass.solution(array.subList(0, 100000)));

    }  
}

Just trying to reply directly to the question you posed. It is not a simple question to answer why Python takes 9 seconds and Java is 50 times faster. Here you can get a good insight of a precedent discussion Is Python slower than Java/C#?

The way I like to look at it, is that Java is a Object Oriented language, while python is Object Based.

When looking at a bitwise operation, Java uses the primitive data types that are arguably faster due to not having boxing-unboxing operations and wrappers as a layer of abstraction. So looking at your code at every iteration python re-wrappes the integer as an object of type integer, while Java does not.

But again, I wouldn't take for granted that Java is always faster than Python. It is up to which library you are using and which problem you are trying to solve!

The pythonic way to do this is

def solution(array):
    return len(set(array))

It is much faster, though will probably use more memory.

The set solution ran in about 100 ms for 10**6 samples from a 2*10**6 range. I didn't even time the bit array because it took seconds.

When talking about lists on the order of 10**6 , it is worth the trade off. Using sys.getsizeof , I measured the intermediate set as using 4.2 times the memory of the list . The equivalent int bit array has about 1/30 the memory of the list . This is on a 64 bit Linux system.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM