简体   繁体   English

比较数据的最有效方法

[英]Most efficient way to compare data

In our application, there is a periodically called function from which previous call results need to be compared to current call results as below 在我们的应用程序中,有一个定期调用的函数,需要将以前的调用结果与当前的调用结果进行比较,如下所示

public class Record
{
  public long Id{get;set}
  public bool SwitchStatus{get;set;}

//.....Other Fields.....
}

public class Consumer
{
  private List<Tuple<long, bool>> _sortedRecordIdAndStatus = new List<Tuple<long, bool>>();

 void IGetCalledEveryThreeSeconds(List<Record> records)
 {

    var currentsortedRecordIdAndStatus = records.Select(x=> new Tuple<long, bool> (x.Id, x.SwitchStatus)).ToList();
   if(!currentsortedRecordIdAndStatus.SequenceEqual(_sortedRecordIdAndStatus))
   {
      DoSomething();
   }

  _sortedRecordIdAndStatus  = currentsortedRecordIdAndStatus;

 }

} }

The ToList() function takes a lot of time when the function is called with thousands of records. 调用具有数千条记录的函数时,ToList()函数会花费大量时间。 That is currently the bottleneck. 这是当前的瓶颈。

I am trying to optimize this routine. 我正在尝试优化此例程。 All I need is to compare is a block of data is same or not 我需要比较的是一个数据块是否相同

I think I just need create a block of data from the incoming records and compare the block with the next call block created and so on....All I need to know is if the block is same(that is including order). 我想我只需要从传入记录中创建一个数据块,并将该块与创建的下一个调用块进行比较,依此类推...。我只需要知道该块是否相同(包括顺序)即可。 I don't even need to look into the data inside 我什至不需要查看里面的数据

Eg. 例如。 for the content of block 对于块的内容

[[1000][true]]
[[2000][false]]
[[1500][true]]

Is there any way to optimize my code? 有什么方法可以优化我的代码?

This comes with the standard caveat that premature optimization is the root of all evil, and I'm just going to trust your statement that this is really a bottleneck in your application's performance. 这带有标准的警告,即过早的优化是万恶之源,我只是相信您的声明,这确实是应用程序性能的瓶颈。

  1. If there's any possibility that the blocks of records will have different lengths, it'll be much quicker to check their lengths than to iterate over them. 如果有可能记录块的长度不同,那么检查它们的长度要比遍历它们快得多。
  2. If you can use a streaming LINQ operator, you can short-circuit the moment you notice the two blocks aren't the same. 如果可以使用流式LINQ运算符,则可以在发现两个块不相同时立即短路。 If it's frequently the case that they're not the same, that can make a pretty big performance improvement. 如果经常出现不同的情况,那么可以大大提高性能。
  3. If it's not too much memory overhead, and if it's safe for you to assume that other parts of the code won't be changing the List after it's passed in to your method, you should consider just hanging on to the List you're given rather than creating a new one. 如果没有太多的内存开销,并且可以安全地假定代码的其他部分在传递给您的方法后不会更改List,则应该考虑挂接到给出的List而不是创建一个新的。

Something like this: 像这样:

public class Consumer
{
 private List<Record> _previousRecords = new List<Record>();

 void IGetCalledEveryThreeSeconds(List<Record> records)
 {
    if(records.Count == _previousRecords.Count
       && records.Select(x => (x.Id, x.SwitchStatus)).SequenceEqual(
          _previousRecords.Select(x => x.Id, x.SwitchStatus))
    {
      DoSomething();
    }

    _previousRecords  = records;
 }

However, considering your comments that the inputs are usually the same, I don't know if these optimizations will even be beneficial. 但是,考虑到您的意见,即输入通常是相同的,我不知道这些优化是否会有所帮助。 Since you pretty much have to iterate over the entire list to verify that they're different, regardless, these optimizations won't improve things by an order of magnitude. 由于您几乎必须遍历整个列表以验证它们是否不同,因此无论如何,这些优化都无法将性能提高一个数量级。 And it's hard to know whether avoiding the creation of a new List each time will offset the overhead of selecting new Tuples from _previousRecords each time. 而且很难知道是否每次都避免创建新的List是否会抵消每次从_previousRecords中选择新的元组的开销。

If you really need to squeeze every ounce of performance out of this, and you're positive this is the bottleneck, and you can't come up with a broader architectural solution that avoids this bottleneck in the first place, your last best option is probably to avoid LINQ and go with a for loop. 如果您真的需要从中榨取每一分钱的性能,并且您肯定这是瓶颈,并且您无法想出一个能够避免这一瓶颈的更广泛的体系结构解决方案,那么您的最佳选择是可能是为了避免LINQ并使用for循环。 But the improvements probably won't be significant enough to make a business-level difference. 但是这些改进可能不足以使业务水平发生重大变化。

public class Consumer
{
 private List<Record> _previousRecords = new List<Record>();

 void IGetCalledEveryThreeSeconds(List<Record> records)
 {
    var length = records.Count;
    if(length != _previousRecords.Count)
    {
      return;
    }

    for(int i = 0; i < length; i++)
    {
        var record1 = records[i];
        var record2 = _previousRecords[i];
        if(record1.Id != record2.Id || record1.SwitchStatus != record2.SwitchStatus)
        {
          _previousRecords = records;
          return;
        }
    }

    DoSomething();
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM