简体   繁体   中英

How can I speed up this linq query on a List<> with search criteria on 3 attributes in the list object

I have the following LINQ query.

There is a list of about 55 000 items. I need to do a search on three of the attributes within the items.

here is my code:

private List<Device> Devices = _db.Devices.ToList();

public Device TryFindDeviceInNetworks(ALL_Sims sim)
{
    var ips = new List<string>();
    if (sim.IP1 != null)
    {
        ips.Add(sim.IP1);
    }
    if (sim.IP2 != null)
    {
        ips.Add(sim.IP2);
    }

    var device =
        Devices.FirstOrDefault(
            x => ips.Contains(x.IPaddress1)
                 || ips.Contains(x.IPaddress2)
                 || ips.Contains(x.IPaddress3));

    return device;
}

Currently this operation takes some time.

I have a for loop going through about 100k items and in each iteration of it calls this function TryFindDeviceInNetworks() . It runs 8 - 10 hours plus on intel i5. Obviously this is single threaded.

My question is how can I speed this up? I have converted a few of my lists to dictionaries where I can in the application and this has helped drastically, however in this case I cant search on only one key?

Is there some sort of data structure which would be better suited than the List<T> ?

The database is not located locally on on a LAN, so estimated ping of at least ~40ms + query time would be added to every iteration.

Instead of having one dictionary with all 3 IP Addresses you could have 3 dictionaries:

private List<Device> Devices = new List<Device>();

private Dictionary<string, Device> mapIP1;
private Dictionary<string, Device> mapIP2;
private Dictionary<string, Device> mapIP3;

You'd have to initialize them before doing the search:

public void InitializeDictionaries()
{
    mapIP1 = Devices.ToDictionary(x => x.IPaddress1);
    mapIP2 = Devices.ToDictionary(x => x.IPaddress2);
    mapIP3 = Devices.ToDictionary(x => x.IPaddress3);
}

The search itself can use TryGetValue :

public Device TryFindDeviceInNetworks(ALL_Sims sim)
{
    Device device = null;

    if (sim.IP1 != null)
    {
        if (mapIP1.TryGetValue(sim.IP1, out device))
            return device;
        if (mapIP2.TryGetValue(sim.IP1, out device))
            return device;
        if (mapIP3.TryGetValue(sim.IP1, out device))
            return device;
    }

    if (sim.IP2 != null)
    {
        if (mapIP1.TryGetValue(sim.IP2, out device))
            return device;
        if (mapIP2.TryGetValue(sim.IP2, out device))
            return device;
        if (mapIP3.TryGetValue(sim.IP2, out device))
            return device;
    }

    return device;
}

You'd have to be sure that there is no elements in Devices list that share the same address though, as Dictionary<TKey, TValue> can't work with duplicate keys.

var device =
    Devices.FirstOrDefault(
        x => ips.Contains(x.IPaddress1)
             || ips.Contains(x.IPaddress2)
             || ips.Contains(x.IPaddress3));

Is going to enumerate ips three times in the worst possible case (no matches). I would re-write it as:

var device =
    Devices.FirstOrDefault(
        x => ips.Any(y => y == x.IPaddress1
             || y == x.IPaddress2
             || y == x.IPaddress3);

So it only enumerates it once, checking each possible "match condition" as it goes and returning as soon as it finds one.

As some of the commenters have said, finding a way to do a simple number comparison will also be faster than a string comparison if it is possible.

Write your own loop instead of the linq and see how it performs:

Note: this code assumes you will always have 2 IP addresses in the ips collection, as stated in the comments on the question.

foreach(var device in Devices)
{
      if(device.IPaddress1 == ips[0] || device.IPaddress2 == ips[0] || device.IPaddress3 == ips[0] || device.IPaddress1 == ips[1] || device.IPaddress2 == ips[1] || device.IPaddress3 == ips[1] )
      return device;
}

This unrolls one of the loops and takes advantage of being able to return as soon as a match is found.

Further optimization can be performed by letting the database do the work instead.

Create lookups for each of your three keys. It won't make much of a difference if you were only doing the search once, but if you're doing it a hundred thousand times, being able to lookup the values based on the IP in constant time will be a huge win:

public class Foo
{
    private List<FSKDevice> Devices = _db.Devices.ToList();
    private IList<ILookup<string, FSKDevice>> lookups;

    public Foo()
    {
        lookups = new[]{ 
            Devices.ToLookup(device => device.IPaddress1),
            Devices.ToLookup(device => device.IPaddress2),
            Devices.ToLookup(device => device.IPaddress3),
        };
    }

    public FSKDevice TryFindDeviceInNetworks(ALL_Sims sim)
    {
        var ips =  new[] { sim.IP1, sim.IP2 }
            .Where(ip => ip != null);

        return (from ip in ips
                from lookup in lookups
                let matches = lookup[ip]
                where matches.Any()
                select matches.First())
                    .FirstOrDefault();
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM