简体繁体 English

使用L1 CPU缓存的c ++算法的Python实现

[英]Python implementation of c++ algorithm using L1 CPU Cache

原文 2017-10-31 18:42:43 7 1 python/ c++/ python-2.7/ sieve-of-eratosthenes

I am looking to make a python implementation of the sieve of Eratosthenes with a segmented sieve, and using L1 Cache of CPU. 我正在寻找使用分段筛子并使用CPU的L1 Cache来实现Eratosthenes筛子的python实现的方法。

I have my own version on github here: https://github.com/nick599/PythonMathsAlgorithms/blob/master/segmented_soe_v6.py , which does not use L1 cache size of the CPU. 我在github上有自己的版本： https : //github.com/nick599/PythonMathsAlgorithms/blob/master/segmented_soe_v6.py ，它不使用CPU的L1缓存大小。

I found the following site - http://primesieve.org/segmented_sieve.html , which gives a C++ implementation using the L1 cache size. 我找到了以下站点-http://primesieve.org/segmented_sieve.html ，该站点提供了使用L1缓存大小的C ++实现。 It says it is much faster than my algorithm (mine takes several minutes for creating primes upto 10^7, and hangs on 10^8 due to memory usage). 它说它比我的算法快得多（我的算法花了几分钟来创建高达10 ^ 7的素数，并且由于内存使用情况而挂在10 ^ 8上）。

I am developing on Linux Mint v17, python version: 2.74. 我正在Linux Mint v17上进行开发，python版本：2.74。 Update My CPU is an Intel i7. 更新我的CPU是Intel i7。

I am fairly new to python. 我是python的新手。

I want to know: 我想知道：

How I could start implementing a python version of this C++ algorithm? 我如何开始实现此C ++算法的python版本？
What I would need to consider? 我需要考虑什么？
Are there things in the C++ implementation that can't be coded in Python 2.74? C ++实现中是否存在无法在Python 2.74中编码的东西？
What about multithreading? 那多线程呢？
What about hyperthreading? 那超线程呢？
What about python's GIL? python的GIL呢？

Looking for answers that answer the spirit of all my questions above. 寻找答案，以回答以上所有问题的精神。 Hints and tips are welcomed. 欢迎提示和技巧。

1 个解决方案

I'm not sure you can make enough assumptions about how Python uses memory in order to ensure it efficiently uses the L1 cache. 我不确定您是否可以对Python如何使用内存做出足够的假设，以确保它有效地使用L1缓存。 Also, 10^8 is only 1/2 Gig, so your current implementation must be pretty inefficient at element allocation as it stands. 另外，10 ^ 8仅为1/2 Gig，因此您当前的实现在元素分配方面必须非常低效。 You may be better off creating the largest possible string, and indexing that as your sieve storage, rather than using an array of integers if you are only going to store a single flag in each location? 您最好创建尽可能大的字符串，并将其索引为筛存储，而不是如果只打算在每个位置存储单个标志，则不使用整数数组？ It would certainly be possible to use a string as your segmented sieve storage, and if you are lucky, they might be small enough to live on the L1 cache. 当然，可以将字符串用作分段筛存储，如果幸运的话，它们可能足够小，可以驻留在L1高速缓存中。 C has some good bit indexing and manipulation, which I'm sure are available in python to allow you to independently manipulate each bit. C有一些不错的位索引和操作，我相信python中提供了这些功能，可以让您独立地操作每个位。 You can do bit manipulation on character values. 您可以对字符值进行位操作。