简体   繁体   中英

Value lookup using key or vice versa

First of all, apologies for the nasty title. I will correct it later.

I have some data like below,

"BOULEVARD","BOUL","BOULV", "BLVD"

I need a data structure that is O(1) for looking up any of this words by other. For example, if I use a dictionary I would need to store this keys/values like this, which looks odd to me,

abbr.Add("BLVD", new List<string> { "BOULEVARD","BOUL","BOULV", "BLVD" });
abbr.Add("BOUL", new List<string> { "BOULEVARD", "BOUL", "BOULV", "BLVD" });
abbr.Add("BOULV", new List<string> { "BOULEVARD", "BOUL", "BOULV", "BLVD" });
abbr.Add("BOULEVARD", new List<string> { "BOULEVARD", "BOUL", "BOULV", "BLVD" });

Which data structure to use to keep this data appropriate to my querying terms?

Thanks in advance

Create two HashMap - one maps word to a group number. And the other one maps group number to a list of words. This way you save some memory.

Map<String, Integer> - Word to Group Number
Map<Integer, List<String>> - Group Number to a list of words

You need two O(1) lookups - first to get the group number and then by it - get the list of words.

Assuming abbr is a Dictionary<String, IEnumerable<String>> , you could use the following function:

public static void IndexAbbreviations(IEnumerable<String> abbreviations) {
    for (var a in abbreviations)
        abbr.Add(a, abbreviations);
}

This will populate the dictionary with the provided list of abbreviations such that when any of them is looked up in the dictionary. It is slightly better than the example code you provided, because I am not creating a new object for each value.

From the documentation , "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary(Of TKey, TValue) class is implemented as a hash table."

The choice of dictionary looks fine to me. As mentioned above, you should use the same list to be referenced in the dictionary. The code could go something like this:

var allAbrList = new List<List<string>>
                 {
                    new List<string> {"BOULEVARD", "BOUL", "BOULV", "BLVD"},
                    new List<string> {"STREET", "ST", "STR"},
                    // ...
                 };

var allAbrLookup = new Dictionary<string, List<string>>();
foreach (List<string> list in allAbrList)
{
    foreach (string abbr in list)
    {
        allAbrLookup.Add(abbr, list);
    }
}

The last part could be converted into LINQ to have less code, but this way it is easier to understand.

If you don't create a new list for each key, then a Dictionary<string, List<string>> will be fast and reasonably memory-efficient as long as the amount of data isn't enormous. You might also be able to get a little extra benefit from reusing the strings themselves, though the optimizer might take care of that for you anyway.

var abbr = new Dictionary<string, List<string>>;

var values = new List<string> { "BOULEVARD","BOUL","BOULV", "BLVD" };

foreach(var aValue in values) abbr.add(value, values);

I don't see a reason to define the value part of your dictionary as a List<string> object, but perhaps that is your requirement. This answer assumes that you just want to know whether the word essentially means "Boulevard".

I would pick one value as the "official" value and map all of the other values to it, like this:

        var abbr = new Dictionary<string, string>(StringComparer.CurrentCultureIgnoreCase);

        abbr.Add("BLVD", "BLVD"); // this line may be optional
        abbr.Add("BOUL", "BLVD");
        abbr.Add("BOULV", "BLVD");
        abbr.Add("BOULEVARD", "BLVD");

Alternatively, you could define an enum for the value part of the dictionary, as shown below:

    enum AddressLine1Suffix
    {
        Road,
        Street,
        Avenue,
        Boulevard,
    }


        var abbr = new Dictionary<string, AddressLine1Suffix>(StringComparer.CurrentCultureIgnoreCase);

        abbr.Add("BLVD", AddressLine1Suffix.Boulevard);
        abbr.Add("BOUL", AddressLine1Suffix.Boulevard);
        abbr.Add("BOULV", AddressLine1Suffix.Boulevard);
        abbr.Add("BOULEVARD", AddressLine1Suffix.Boulevard);

As Petar Minchev already said, you can split your list into an list of groups and a list of keys that points to this group. To simplify this (in usage) you can write an own implementation of IDictionary and use the Add method to build those groups. I gave it a try and it seems to work. Here are the important parts of the implementation:

public class GroupedDictionary<T> : IDictionary<T,IList<T>>
{
    private Dictionary<T, int> _keys;
    private Dictionary<int, IList<T>> _valueGroups;

    public GroupedDictionary()
    {
        _keys = new Dictionary<T, int>();
        _valueGroups = new Dictionary<int, IList<T>>();
    }

    public void Add(KeyValuePair<T, IList<T>> item)
    {
        Add(item.Key, item.Value);
    }

    public void Add(T key, IList<T> value)
    {
        // look if some of the values already exist
        int existingGroupKey = -1;
        foreach (T v in value)
        {
            if (_keys.Keys.Contains(v))
            {
                existingGroupKey = _keys[v];
                break;
            }
        }
        if (existingGroupKey == -1)
        {
            // new group
            int newGroupKey = _valueGroups.Count;
            _valueGroups.Add(newGroupKey, new List<T>(value));
            _valueGroups[newGroupKey].Add(key);
            foreach (T v in value)
            {
                _keys.Add(v, newGroupKey);
            }
            _keys.Add(key, newGroupKey);
        }
        else
        {
            // existing group
            _valueGroups[existingGroupKey].Add(key);
            // add items that are new
            foreach (T v in value)
            {
                if(!_valueGroups[existingGroupKey].Contains(v))
                {
                    _valueGroups[existingGroupKey].Add(v);
                }
            }
            // add new keys
            _keys.Add(key, existingGroupKey);
            foreach (T v in value)
            {
                if (!_keys.Keys.Contains(v))
                {
                    _keys.Add(v, existingGroupKey);
                }
            }
        }
    }

    public IList<T> this[T key]
    {
        get { return _valueGroups[_keys[key]]; }
        set { throw new NotImplementedException(); }
    }
}

The usage could look like this:

var groupedDictionary = new GroupedDictionary<string>();
groupedDictionary.Add("BLVD", new List<string> {"BOUL", "BOULV"}); // after that three keys exist and one list of three items
groupedDictionary.Add("BOULEVARD", new List<string> {"BLVD"}); // now there is a fourth key and the key is added to the existing list instance
var items = groupedDictionary["BOULV"]; // will give you the list with four items

Sure it is a lot of work to implement the whole interface but it will give to an encapsulated class that you don't have to worry about, after it is finished.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM