简体   繁体   English

使用Javascript二进制搜索文本文件中的一行

[英]Binary Search for a line in a text file using Javascript

Is there a way to do a disk-based binary search for a particular key in a text file in Javascript?有没有办法在 Javascript 中对文本文件中的特定键进行基于磁盘的二进制搜索? The text file is too big to be loaded into memory, but sorted by the key values.文本文件太大而无法加载到内存中,但按键值排序。 In particular I am looking for a way to mimic Perl's Search::Dict functionality in Javascript.特别是我正在寻找一种在 Javascript 中模仿 Perl 的Search::Dict功能的方法。

For eg If I have a file foo.txt:例如,如果我有一个文件 foo.txt:

a 1
b 10
c 5
z 4

look(c,foo.txt) should return the line ' c 5 ', by doing a binary search and not traversing the file linearly. look(c,foo.txt)应该通过进行二分搜索而不是线性遍历文件来返回行 ' c 5 '。

Not really, binary searches are really only possible when you can identify the record beginnings.并非如此,只有当您可以识别记录开头时,才有可能进行二分查找。 You appear to have variable length records so, unless you create an array of line start offsets, it's not going to work.您似乎有可变长度的记录,因此,除非您创建一个行起始偏移量数组,否则它不会起作用。

As Nikhil rightly points out in a comment, one method would be to binary divide the file based on file size and then find the closest line beginning from there.正如 Nikhil 在评论中正确指出的那样,一种方法是根据文件大小对文件进行二进制划分,然后从那里开始找到最接近的行。 That would still be relatively efficient (ie, much better than a sequential search).这仍然是相对有效的(即不是顺序查找更好)。

I don't know Javascript, but can if you can do random seeks, you can do a binary search by seeking to the midpoint of your current block (in bytes) and then march forward until you've consumed a newline, as long as you "know" that your key is against a newline.我不知道 Javascript,但是如果您可以进行随机搜索,则可以通过搜索到当前块的中点(以字节为单位)进行二分搜索,然后继续前进,直到消耗了换行符,只要您“知道”您的密钥是针对换行符的。

There will be cases where you need to march backward, though, so you might do your seeks with knowledge of the file buffering so that back-steps are not expensive.但是,在某些情况下,您需要向后前进,因此您可能会在了解文件缓冲的情况下进行搜索,这样后退的成本就不会太高。

I suppose this could be a bit hairier if you're not dealing with ASCII files.我想如果您不处理 ASCII 文件,这可能会有点麻烦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM