简体   繁体   中英

What data structure should I use for a map with variable set as keys?

My dataset looks like this:

Task-1, Priority1, (SkillA, SkillB)
Task-2, Priority2, (SkillA)
Task-3, Priority3, (SkillB, SkillC)

Calling application (client) will send in a list of skills - say (SkillD, SkillA).

lookup:

  1. Search thru dataset for SkillD first, and not find anything.
  2. Search for SkillA. We will find two entries - Task-1 with Priority1, Task-2 with Priority2.
  3. Identify the task with highest priority (in this case, Task-1)
  4. Remove Task-1 from that dataset & return Task-1 to client

Design considerations:

  • there will be lot of add/update/delete to the dataset when website goes live
  • There are only few skills but not a static list (about 10), but for each skill, there can be thousands of tasks. So, the lookup/retrieval will have to be extremely fast

I have considered simple List with binarySearch(comparator) or Map(skill, SortedSettasks(task)), but looking for more ideas.

What is the best way to design a data structure for this kind of dataset that allows a complex key and sorted array of tasks associated with that key.

I would consider MongoDB. The data object for one of your rows sounds like a good fit into a JSON format, versus a row in a table. The reason is because the skill set list may grow. In classic relational DB you solve this through one of three ways, have ever expanding columns to make sure you have max number of skill set columns (this is very ugly), have a separate table that has grouping of skill sets matched to an ID, or store the skill sets as a comma delimited list of skill sets. Each of these suck. In MongoDB you can have array fields and the items in the array are indexable.

So with this in mind I would do all the querying on MongoDB and let it deal with it all. I would create a POJO that would like this:

public class TaskPriority {

String taskId;
String priorityId;
List<String> skillIds;

}

In MongoDB you can index all these fields to get fast searching and querying.

If it is the case that you have to cache these items locally and do these queries off of Java data structures then what you can do is create an index for the items you care about that reference instances of the TaskPriority object.

For example to track skill sets to their TaskPriority's then the following Map can be used:

Map<String, TaskPriority> skillSetToTaskPriority;

You can repeat this for taskId and priorityId. You would have to manage these indexes. This is usually the job of your DB to do.

Finally, you can then have POJO's and tables (or MongodDB collections) that map the taskId to a Task object that contains any meta data about that task that you may wish to have. And the same is true for Priority and SkillSet. So thats 4 MongoDB collections... Tasks, Priorities, SkillSets, and TaskPriorities.

How about changing the aproach a bit? You can use the Guava and a Multimap in particular.

Every experienced Java programmer has, at one point or another, implemented a Map<K, List<V>> or Map<K, Set<V>> , and dealt with the awkwardness of that structure. For example, Map<K, Set<V>> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.

There are two ways to think of a Multimap conceptually: as a collection of mappings from single keys to single values:

I would suggest you having a Multimap of and the answer to your problem in a powerfull feature introduced by Multimap called Views

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM