简体   繁体   English

通过循环多项式对n-gram进行哈希处理-Java实现

[英]Hashing n-grams by cyclic polynomials - java implementation

I'm solving some problem that involves Rabin–Karp string search algorithm. 我正在解决一些涉及Rabin–Karp字符串搜索算法的问题。 This algorithm requires rolling hash to be faster then naive search. 该算法要求滚动哈希比天真搜索更快。 This article describes how to implement rolling hash. 本文介绍如何实现滚动哈希。 I implemented "Rabin-Karp rolling hash" without problems and found few implementations implementations , but article also mentions computational complexity and that hashing n-grams by cyclic polynomials is prefered. 我没有问题地实现了“ Rabin-Karp滚动哈希”,并发现了很少的实现实现 ,但是本文还提到了计算复杂性,并且首选使用循环多项式对n-gram进行哈希。 It links to BuzHash implementation of such technique but I wonder how it can be used to build n-gram hash on top of it. 它链接到这种技术的BuzHash实现,但我不知道如何将其用于在其之上构建n-gram哈希。 I want to have something like this or 我想有一些像这样

CPHash cp = new CPHash("efghijk");
cp.shiftRight('l') // now we got hash of "fghijki"
cp.shiftLeft('d') // "defghi"

for java. 对于Java。

For people who will encounter problems related with string search (like me) there are some articles that I found usefull: 1 , 2 , 3 谁的人会遇到字符串搜索(像我)相关的有一些文章,我发现有用的问题: 123

I recently published an Apache licensed Java library which implements several rolling hash functions including Cyclic and Rabin-Karp: 我最近发布了一个Apache许可的Java库,该库实现了多个滚动哈希函数,包括Cyclic和Rabin-Karp:

http://code.google.com/p/rollinghashjava/ http://code.google.com/p/rollinghashjava/

https://github.com/lemire/rollinghashjava https://github.com/lemire/rollinghashjava

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM