简体   繁体   中英

How to efficiently extract all values in a large master table that start with a specified string

I'm designing the UI for a Lua program, one element of which requires the user to either select an existing value from a master table or create a new value in that table.

I would normally use an IUP list with EDITBOX = "YES".

However, the number of items that the user can select may run into many hundreds or possibly thousands, and the performance when populating the list in iup (and also selecting from it) is unacceptably slow. I cannot control the number of items in the table.

My current thinking is to create a list with an editbox, but without any values. As the user types into the editbox (after perhaps 2-3 characters) the list would populate with the subset of table items that start with the characters typed. The user could then select an item from the list or keep typing to narrow the options or create a new item.

For this to work, I need to be able to create a new table with the items from the master table that start with the entered characters.

One option would be to iterate through the master table using the Penlight 'startswith' function to create the new table:

require "pl.init"
local subtable = {} --empty result table
local startstring = "xyz" -- will actually be set by the iup control
for _, v in ipairs (mastertable) do 
    if stringx.startswith(v, startstring) then
        table.insert(subtable,v)
    end
end

However, I'm worried about the performance of doing that if the master table is huge. Is there a more efficient way to code this, or a different way I could implement the UI?

There are various approaches you can take to improve the big-O performance of your prefix search, at the cost of increased code complexity; that said, given the size of your dataset (thousands of items) and the intended use (triggered by user interaction, rather than eg game logic that needs to run every frame), I think a simple linear search over the options is almost certainly going to be fast enough.

To test this theory, I timed the following code:

local dict = {}
for word in io.lines('/usr/share/dict/words') do
  table.insert(dict, word)
end
local matched = {}
local search = "^" .. (...)
for _,word in ipairs(dict) do
  if word:match(search) then
    table.insert(matched, word)
  end
end
print('Found '..#matched..' words.')

I used /usr/bin/time -v and tried it with both lua 5.2 and luaJIT.

Note that this is fairly pessimistic compared to your code:

  • no attempt made to localize library functions that are repeatedly called, or use # instead of table.insert
  • timing includes not just the search but also the cost of loading the dictionary into memory in the first place
  • string.match is almost certainly slower than stringx.startswith
  • dictionary contains ~100k entries rather than the "hundreds to thousands" you expect in your application

Even with all those caveats, it costs 50-100ms in lua5.2 and 30-50ms in luaJIT, over 50 runs.

If I use os.clock() to time the actual search, it consistently costs about 10ms in lua5.2 and 3-4 in luajit.

Now, this is on a fairly fast laptop (Core i7), but also non-optimized code running on a dataset 10-100x larger than you expect to process; given that, I suspect that the naïve approach of just looping over the entries calling startswith will be plenty fast for your purposes, and result in code that's significantly simpler and easier to debug.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM