简体   繁体   中英

Most efficient way to compare data

In our application, there is a periodically called function from which previous call results need to be compared to current call results as below

public class Record
{
  public long Id{get;set}
  public bool SwitchStatus{get;set;}

//.....Other Fields.....
}

public class Consumer
{
  private List<Tuple<long, bool>> _sortedRecordIdAndStatus = new List<Tuple<long, bool>>();

 void IGetCalledEveryThreeSeconds(List<Record> records)
 {

    var currentsortedRecordIdAndStatus = records.Select(x=> new Tuple<long, bool> (x.Id, x.SwitchStatus)).ToList();
   if(!currentsortedRecordIdAndStatus.SequenceEqual(_sortedRecordIdAndStatus))
   {
      DoSomething();
   }

  _sortedRecordIdAndStatus  = currentsortedRecordIdAndStatus;

 }

}

The ToList() function takes a lot of time when the function is called with thousands of records. That is currently the bottleneck.

I am trying to optimize this routine. All I need is to compare is a block of data is same or not

I think I just need create a block of data from the incoming records and compare the block with the next call block created and so on....All I need to know is if the block is same(that is including order). I don't even need to look into the data inside

Eg. for the content of block

[[1000][true]]
[[2000][false]]
[[1500][true]]

Is there any way to optimize my code?

This comes with the standard caveat that premature optimization is the root of all evil, and I'm just going to trust your statement that this is really a bottleneck in your application's performance.

  1. If there's any possibility that the blocks of records will have different lengths, it'll be much quicker to check their lengths than to iterate over them.
  2. If you can use a streaming LINQ operator, you can short-circuit the moment you notice the two blocks aren't the same. If it's frequently the case that they're not the same, that can make a pretty big performance improvement.
  3. If it's not too much memory overhead, and if it's safe for you to assume that other parts of the code won't be changing the List after it's passed in to your method, you should consider just hanging on to the List you're given rather than creating a new one.

Something like this:

public class Consumer
{
 private List<Record> _previousRecords = new List<Record>();

 void IGetCalledEveryThreeSeconds(List<Record> records)
 {
    if(records.Count == _previousRecords.Count
       && records.Select(x => (x.Id, x.SwitchStatus)).SequenceEqual(
          _previousRecords.Select(x => x.Id, x.SwitchStatus))
    {
      DoSomething();
    }

    _previousRecords  = records;
 }

However, considering your comments that the inputs are usually the same, I don't know if these optimizations will even be beneficial. Since you pretty much have to iterate over the entire list to verify that they're different, regardless, these optimizations won't improve things by an order of magnitude. And it's hard to know whether avoiding the creation of a new List each time will offset the overhead of selecting new Tuples from _previousRecords each time.

If you really need to squeeze every ounce of performance out of this, and you're positive this is the bottleneck, and you can't come up with a broader architectural solution that avoids this bottleneck in the first place, your last best option is probably to avoid LINQ and go with a for loop. But the improvements probably won't be significant enough to make a business-level difference.

public class Consumer
{
 private List<Record> _previousRecords = new List<Record>();

 void IGetCalledEveryThreeSeconds(List<Record> records)
 {
    var length = records.Count;
    if(length != _previousRecords.Count)
    {
      return;
    }

    for(int i = 0; i < length; i++)
    {
        var record1 = records[i];
        var record2 = _previousRecords[i];
        if(record1.Id != record2.Id || record1.SwitchStatus != record2.SwitchStatus)
        {
          _previousRecords = records;
          return;
        }
    }

    DoSomething();
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM