简体   繁体   中英

Efficient Search Algorithm For Specific Data Fields

So I am actually been assigned to write algorithms on filtering/searching.

Task: Filter: search and list objects that fulfill specified attribute(s)

Say The whole system is a student registration record system.

I have data as shown below. I will need to filter and search by these attributes say search/filter by gender or student name or date of birth etc.

Student Name, Gender, Date Of Birth, Mobile No

Is there specific efficient algorithm formula or method for each of these field.

Example, strings and integers each has their own type of efficient search algorithm right?

Here's what I am going to do. I am going to code a binary search algorithm for searching/filtering based on these fields above.

That's it. But yeah that's easy to be honest.

But I am just curious like what's the proper and appropriate coding approach for a efficient search/filter algorithm for each of these fields will you guys do?

I will not be using sequential search algorithm obviously as this will involve huge data so I am not going to iterate each of these data to downgrade efficiency performance.

Sequential search algorithm will be used when needed if data is less.

Searching is a very broad topic and it completely depends upon your use case. while building an efficient Searching algorithm you should take below factors into consideration

  • What's the size of your data? -is it fixed or it keeps varying periodically?
  • How often you are going to Insert/modify/delete your data?
  • Is your data sorted or unsorted ?
  • Do you need a prefix based search like autosearch,autocomplete,longest prefix search etc?

    Now let's think about the solution/approach

    1. if your data is less and unsorted as you can try Linear Search (which has O(n)time complexity where "n" is size of your data/array)

    2. if your data is already sorted which is not always the case you can use Binary search as it's complexity is 0(log n) . if your data is not sorted then sorting the data again takes (n logn) ~typically if you are using Java, Arrays.sort() by default uses Merge sort or Quick sort which is (n logn) .

    3. if faster retrieval is the main object you can think of HashMaps or HashMaps. the elements of Hashmap are indexed by Hashcode, the time to search for any element would almost be 1 or constant time(if your hash function implementation is good)

    4. Prefix based search :since you mentioned about searching by Names,you also have the option of using " Tries " data structure.

Tries are excellent option if you are performing Insert/Delete/Update functionalities frequently . Lookup of an elements in a Trie is 0( k ) where "k" is the length of the string to be searched.

Since you have registration data where insert,update,deletion is common TRIES Data Structure is a good option to consider.

Also,check this link to choose between Tries and HashTables TriesVsMaps

Below is the sample representation of Tries(img src:Hackerearth)

图片来源:HackerEarth

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM