[英]Is Python bitwise shift really slow?
I must be overlooking something, but really don't see why the Python code is so slow...我一定是忽略了一些东西,但真的不明白为什么 Python 代码这么慢......
Counting unique elements in an array where elements are in the range [−1,000,000..1,000,000] and use a bitvector to do this.计算数组中的唯一元素,其中元素在 [−1,000,000..1,000,000] 范围内,并使用位向量来执行此操作。 The Java code, which uses BitSet
is about 50 times faster than Python, which takes 9 seconds.使用BitSet
的 Java 代码比 Python 快约 50 倍,只需 9 秒。
Is this maybe because when I initialise bitvector = 0
Python doesn't reserve enough memory and the bitvector needs to be copied as it grows?这是否可能是因为当我初始化bitvector = 0
Python 没有保留足够的内存并且 bitvector 需要随着它的增长而被复制?
Python: Python:
def solution(array):
bitvector = 0
count = 0
for element in array:
# transform -1,000,000 to 0 etc
element_transformed = element + 1000000
if bitvector >> element_transformed & 1 == 0:
count += 1
bitvector = bitvector | 1 << element_transformed
return count
Test:测试:
import unittest
import random
from .file1 import solution
class MySolutionTests(unittest.TestCase):
def test_solution_random_all_unique(self):
a = random.sample(range(-1000000, 1000001), 100000)
self.assertEqual(100000, solution(a))
In Java:在 Java 中:
package mypackage;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.List;
public class MyClass {
public static int solution(List<Integer> array) {
BitSet bitvector = new BitSet();
int count = 0;
for(int i = 0; i < array.size(); i++) {
int elementTransformed = array.get(i) + 1000000;
if(bitvector.get(elementTransformed) != true) {
count++;
bitvector.set(elementTransformed, true);
}
}
return count;
}
public static void main(String[] args) {
// TODO code application logic here
}
}
Test:测试:
package mypackage;
import java.util.ArrayList;
import java.util.Collections;
import org.junit.Test;
import static org.junit.Assert.*;
public class MyClassTest {
public MyClassTest() {
}
@Test
public void testSolutionLong_RandomAllUnique() {
ArrayList array = new ArrayList();
for(int i = -1000000; i < 1000000; i++) {
array.add(i);
}
Collections.shuffle(array);
assertEquals(100000, MyClass.solution(array.subList(0, 100000)));
}
}
Just trying to reply directly to the question you posed. 只是尝试直接回答您提出的问题。 It is not a simple question to answer why Python takes 9 seconds and Java is 50 times faster. 回答Python为什么要花9秒而Java要快50倍,这不是一个简单的问题。 Here you can get a good insight of a precedent discussion Is Python slower than Java/C#? 在这里,您可以很好地了解先例讨论的内容Python是否比Java / C#慢?
The way I like to look at it, is that Java is a Object Oriented language, while python is Object Based. 我喜欢看它的方式是Java是一种面向对象的语言,而python是基于对象的。
When looking at a bitwise operation, Java uses the primitive data types that are arguably faster due to not having boxing-unboxing operations and wrappers as a layer of abstraction. 当查看按位运算时,Java使用的原始数据类型可以说更快,这是因为没有装箱/拆箱操作和包装器作为抽象层。 So looking at your code at every iteration python re-wrappes the integer as an object of type integer, while Java does not. 因此,在每次迭代时查看您的代码,python都会将整数重新包装为整数类型的对象,而Java不会。
But again, I wouldn't take for granted that Java is always faster than Python. 但是再次,我不会认为Java总是比Python快。 It is up to which library you are using and which problem you are trying to solve! 您要使用哪个库以及要解决的问题!
The pythonic way to do this is这样做的pythonic方法是
def solution(array):
return len(set(array))
It is much faster, though will probably use more memory.它要快得多,但可能会使用更多的内存。
The set
solution ran in about 100 ms for 10**6
samples from a 2*10**6
range.对于2*10**6
范围内的10**6
样本, set
解决方案在大约 100 毫秒内运行。 I didn't even time the bit array because it took seconds.我什至没有对位数组计时,因为它花了几秒钟。
When talking about lists on the order of 10**6
, it is worth the trade off.在谈论10**6
顺序的列表时,值得权衡。 Using sys.getsizeof
, I measured the intermediate set
as using 4.2 times the memory of the list
.使用sys.getsizeof
,我测量了中间set
使用了list
内存的 4.2 倍。 The equivalent int
bit array has about 1/30 the memory of the list
.等效的int
位数组大约是list
内存的 1/30。 This is on a 64 bit Linux system.这是在 64 位 Linux 系统上。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.