在干草堆中搜索几个等长的针（Python）

Question

I am looking for a way to search a large string for a large number of equal length substrings.我正在寻找一种在大字符串中搜索大量等长子字符串的方法。

My current method is basically this:我目前的方法基本上是这样的：

offset = 0
found = []

while offset < len(haystack):
  current_chunk = haystack[offset*8:offset*8+8]
  if current_chunk in needles:
     found.append(current_chunk)
  offset += 1

This is painfully slow.这是痛苦的缓慢。 Is there a better python way of doing this?有没有更好的python方法来做到这一点？

Answer 1

More Pythonic, much faster:更多 Pythonic，更快：

for needle in needles:
    if needle in haystack:
        found.append(needle)

Edit: With some limited testing here are test results编辑：这里有一些有限的测试是测试结果

This algorithm: 0.000135183334351这个算法： 0.000135183334351

Your algorithm: 0.984048128128你的算法： 0.984048128128

Much faster.快多了。

Answer 2

I think that you can break it up on a multicore and parallelize your search.我认为您可以在多核上将其分解并并行化您的搜索。 Something along the lines of:类似的东西：

from multiprocessing import Pool

text = "Your very long string"

"""
A generator function for chopping up a given list into chunks of
length n.
"""
def chunks(l, n):
  for i in xrange(0, len(l), n):
    yield l[i:i+n]

def searchHaystack(haystack, needles):
    offset = 0
    found = []

    while offset < len(haystack):
      current_chunk = haystack[offset*8:offset*8+8]
      if current_chunk in needles:
      found.append(current_chunk)
      offset += 1
    return(needles)

# Build a pool of 8 processes
pool = Pool(processes=8,)

# Fragment the string data into 8 chunks
partitioned_text = list(chunks(text, len(text) / 8))

# Generate all the needles found
all_the_needles = pool.map(searchHaystack, partitioned_text, needles)

在干草堆中搜索几个等长的针（Python）

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-09-01 23:31:48

解决方案2
1 2015-09-01 23:37:18

在干草堆中搜索几个等长的针（Python）

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-09-01 23:31:48

解决方案2 1 2015-09-01 23:37:18

解决方案1
4 已采纳 2015-09-01 23:31:48

解决方案2
1 2015-09-01 23:37:18