简体   繁体   中英

Ruby set to store large number of words and look up

I am analyzing strings to check if they have names of places in them. These strings can have letters. numbers and random characters so we extract contiguous sequences of letters and then check if these sequences exist in a dictionary of places.

This corpus dictionary of places has about 45000 names and the smallest is 2-3 characters and the largest is 24 characters.

My initial thoughts are to store them in a Ruby Set and use include? to verify if the PLACES_SET has the sequences in them.

This method that checks for place names is called from inside an Active Job that runs quite frequently.

The entire ruby set file is about 908KB.

  1. What are the memory implications loading such a large set from a job? Are there options for defer loading? Or will manual garbage collection help?

  2. Any other alternatives I should consider like database storage? (This has perf query overhead)

  1. As @sergio has observed, the question is less about memory (1MB is not that large these days; most smartphones could handle it). It is more about how often you load it vs. how frequently you perform a lookup against it once it is loaded.
  2. If the list of places is volatile or needs to be maintained without redeploying your application then some kind of DBMS might be suitable, and if you are worried about performance you can always put it behind a distributed cache like Redis in front of the DB.

The global set looks like a good option, and it will be easily understood by subsequent maintainers.

My advice on performance is to keep it simple, and only optimise for performance when you actually have a performance problem. Otherwise, you risk optimising the wrong thing and making your solution unnecessarily complex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM