简体   繁体   English

用于搜索(大)未排序数组的索引(类型)

[英]Index (type) for searching (large) unsorted array

I have program that loads a (sometimes) large CSV file into an array.我有将(有时)大型 CSV 文件加载到数组中的程序。 The data cannot be sorted, and I do not know if the data is text or numbers.数据无法排序,不知道数据是文本还是数字。 This is up to customers.这取决于客户。

Example could be示例可能是

1;JOHN;DOE
2;JANE;DOE;
3;BOBBY;NOTABLES

but it could also be strings但也可以是字符串

MB9384HJ;TEST1
B9284918;TEST2

The number of lines could be up to a few million.行数可能高达几百万。

I would like to seach for a specific value in a column (which one is known ahead of time, this is my "key index column").我想在一列中搜索一个特定的值(这是提前知道的,这是我的“关键索引列”)。 Assume this is unique.假设这是唯一的。 Key is to find which row this column is in.关键是找到该列在哪一行。

Currently the code is traversing from 1..n and comparing.目前代码正在从 1..n 遍历并进行比较。 This is obviously slower towards the end.这显然在接近尾声时变慢了。

I am considering these options:我正在考虑这些选项:

  • a memory SQLite database with key index value and record number具有键索引值和记录号的 memory SQLite 数据库
  • a TStringDictionary with key, record as the pairs带键的 TStringDictionary,记录为对
  • a Hashed stringlist哈希字符串列表

My idea is: instead of traversing the array, I query the index for the key (client provides item to search for, it must be random-access).我的想法是:我不是遍历数组,而是查询键的索引(客户端提供要搜索的项目,它必须是随机访问的)。 Then I immediately get the rownumber of the array, and I can fetch the data.然后我立即得到数组的行号,我可以获取数据。

Which of the these (or other, if any) would be a better path to follow?这些(或其他,如果有的话)中的哪一个是更好的路径?

SQLite is probably too much if you just want to search for the key.如果您只想搜索密钥,SQLite 可能太多了。 It would be interesting if you fill a SQLite table with the CSV and have to do complex queries not only on the keys but also the other columns.如果您使用 CSV 填充 SQLite 表并且不仅必须对键而且还必须对其他列进行复杂查询,那将会很有趣。

A Hashed string list is probably the faster but there is a problem with hash collisions.哈希字符串列表可能更快,但 hash 冲突存在问题。

A Dictionary is probably the best solution in your specific case.字典可能是您特定情况下的最佳解决方案。 And it is easy since Delphi RTL provide the required generic class.而且这很容易,因为 Delphi RTL 提供了所需的通用 class。

Although the newer Delphi (2009+) has built in TDictionary, here (one possible) solution for older Delphi versions.尽管较新的 Delphi (2009+) 已内置 TDictionary,但此处(一种可能的)解决方案适用于较旧的 Delphi 版本。

This is using Delphi Fundamentals 5 which can be compiled even for D6.这是使用Delphi Fundamentals 5 ,甚至可以为 D6 编译。

uses 
   flcDataStructs;
//...
var
   thedict : TIntegerDictionary;
   i : integer;

begin
  thedict := TIntegerDictionary.Create;
  thedicnr.DuplicatesAction := ddIgnore;  // should there be duplicates in my key column

  for i := 0 to length(dataarray)-1 do
    begin
      thedict.Add(dataarray[i], i);
    end;
end;

// to use:
//    rownumber := thedict['stringToSearch'];

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM