I must be overlooking something, but really don't see why the Python code is so slow...
Counting unique elements in an array where elements are in the range [−1,000,000..1,000,000] and use a bitvector to do this. The Java code, which uses BitSet
is about 50 times faster than Python, which takes 9 seconds.
Is this maybe because when I initialise bitvector = 0
Python doesn't reserve enough memory and the bitvector needs to be copied as it grows?
Python:
def solution(array):
bitvector = 0
count = 0
for element in array:
# transform -1,000,000 to 0 etc
element_transformed = element + 1000000
if bitvector >> element_transformed & 1 == 0:
count += 1
bitvector = bitvector | 1 << element_transformed
return count
Test:
import unittest
import random
from .file1 import solution
class MySolutionTests(unittest.TestCase):
def test_solution_random_all_unique(self):
a = random.sample(range(-1000000, 1000001), 100000)
self.assertEqual(100000, solution(a))
In Java:
package mypackage;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.List;
public class MyClass {
public static int solution(List<Integer> array) {
BitSet bitvector = new BitSet();
int count = 0;
for(int i = 0; i < array.size(); i++) {
int elementTransformed = array.get(i) + 1000000;
if(bitvector.get(elementTransformed) != true) {
count++;
bitvector.set(elementTransformed, true);
}
}
return count;
}
public static void main(String[] args) {
// TODO code application logic here
}
}
Test:
package mypackage;
import java.util.ArrayList;
import java.util.Collections;
import org.junit.Test;
import static org.junit.Assert.*;
public class MyClassTest {
public MyClassTest() {
}
@Test
public void testSolutionLong_RandomAllUnique() {
ArrayList array = new ArrayList();
for(int i = -1000000; i < 1000000; i++) {
array.add(i);
}
Collections.shuffle(array);
assertEquals(100000, MyClass.solution(array.subList(0, 100000)));
}
}
Just trying to reply directly to the question you posed. It is not a simple question to answer why Python takes 9 seconds and Java is 50 times faster. Here you can get a good insight of a precedent discussion Is Python slower than Java/C#?
The way I like to look at it, is that Java is a Object Oriented language, while python is Object Based.
When looking at a bitwise operation, Java uses the primitive data types that are arguably faster due to not having boxing-unboxing operations and wrappers as a layer of abstraction. So looking at your code at every iteration python re-wrappes the integer as an object of type integer, while Java does not.
But again, I wouldn't take for granted that Java is always faster than Python. It is up to which library you are using and which problem you are trying to solve!
The pythonic way to do this is
def solution(array):
return len(set(array))
It is much faster, though will probably use more memory.
The set
solution ran in about 100 ms for 10**6
samples from a 2*10**6
range. I didn't even time the bit array because it took seconds.
When talking about lists on the order of 10**6
, it is worth the trade off. Using sys.getsizeof
, I measured the intermediate set
as using 4.2 times the memory of the list
. The equivalent int
bit array has about 1/30 the memory of the list
. This is on a 64 bit Linux system.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.