简体   繁体   English

C#提高数组查找循环性能

[英]C# Increase Array Find Loop Performance

I've got an Datapoint[] file = new Datapoint[2592000] array. 我有一个Datapoint[] file = new Datapoint[2592000]数组。 This array is filled with timestamps and random values. 该数组填充有时间戳和随机值。 Creating them costs me like 2s. 创建它们花了我2秒钟的时间。 But in another function prepareData(); 但是在另一个函数prepareData(); I'm preparing 240 Values for another Array TempBuffer . 我正在为另一个数组TempBuffer准备240个值。 In the prepareData() function I'm searching for matching values in the file array. prepareData()函数中,我正在搜索file数组中的匹配值。 If I can't find any I take the timestamp and set the value to 0 else I'm taking the found value + same timestamp. 如果找不到,我将时间戳记并将其设置为0,否则我将获取发现值+相同的时间戳记。

The function looks like this: 该函数如下所示:

public void prepareData()
{  
    stopWatch.Reset();
    stopWatch.Start();
    Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

    for (double i = unixTimestamp; unixTimestamp - 240 < i; i--)
    {
        bool exists = true;

        if (exists != (Array.Exists(file, element => element.XValue == i)))
        {
            TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), 0) }).ToArray();
        }
        else
        {
            DataPoint point = Array.Find(file, element => element.XValue == i);
            TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), point.YValues) }).ToArray();
        }
    }

    stopWatch.Stop();
    TimeSpan ts = stopWatch.Elapsed;
}

Now the problem is with this amount of data in the file (2'592'000) the function needs like 40 seconds! 现在的问题是file的数据量很大(2'592'000),该功能需要40秒钟! With smaller amounts like 10'000 it's not problem and working fine and fast. 使用较小的数量(例如1万),这不是问题,并且可以正常且快速地工作。 But as soon as I set the file size to my prefered 2'592'000 points the CPU is pushed to 99% Performance and the function needs way too long. 但是,一旦我将file大小设置为我喜欢的2'592'000点,CPU就被推到了99%的性能,并且该功能需要太长时间。

TempBuffer Sample Value: TempBuffer样本值:
X = Converted UnixTimeStamp to DateTime and DateTime Converted To AODate X =将UnixTimeStamp转换为DateTime,并将DateTime转换为AODate
{X=43285.611087963, Y=23} {X = 43285.611087963,Y = 23}

File Sample Value: 文件样本值:
X = Unixtimestamp X = Unix时间戳
{X=1530698090, Y=24} {X = 1530698090,Y = 24}

It's important that the tempbuffer values are converted into AODate since the data inside the tempbuffer array is displayed in a mschart. 将临时缓冲区值转换为AODate非常重要,因为临时缓冲区数组中的数据以mschart显示。

Is there a way to improve my code so I've got better performance? 有没有一种方法可以改善我的代码,以便获得更好的性能?

Array.Exists() and Array.Find() are O(N) operations, you are performing them x M (240) times. Array.Exists()和Array.Find()是O(N)个操作,您要执行它们x M(240)次。

Try LINQ Join instead: 尝试使用LINQ Join:

DataPoint[] dataPoints; // your "file" variable
var seekedTimestamps = Enumerable.Range(0, 240).Select(i => unixTimestamp - i);
var matchingDataPoints = dataPoints.Join(seekedTimestamps, dp => dp.XValue, sts => sts, (dp, sts) => dp);
var missingTimestamps = seekedTimestamps.Except(matchingDataPoints.Select(mdp => mdp.XValue));
// do your logic with found and missing here
// ...

LINQ Join uses hashing (on the selected "keys") and is close to O(n) LINQ Join使用散列(在选定的“键”上)并且接近O(n)

Alternatively, assuming Timestamps in input are unique and you plan to do multiple operations on the input, construct a Dictionary<int (Timestamp), DataPoint> (expensive), which will give you O(1) retrieval of a wanted data point: var dataPoint = dict[wantedTimestamp]; 另外,假设输入中的时间戳是唯一的,并且您打算对输入进行多次操作,则构造一个Dictionary<int (Timestamp), DataPoint> (expensive),这将为您提供O(1)检索所需数据点的方法: var dataPoint = dict[wantedTimestamp];

You haven't given us a complete picture of your code. 您尚未提供完整的代码图。 I would ideally like sample data and the full class definitions. 理想情况下,我希望获得示例数据和完整的类定义。 But given the limit information available I think you'll find something like this works: 但是,鉴于可用的限制信息,我想您会发现类似的作品:

public void prepareData()
{ 
    Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

    var map = file.ToLookup(x => x.XValue);

    TempBuffer =
        Enumerable
            .Range(0, 240)
            .Select(x => unixTimestamp - x)
            .SelectMany(x =>
                map[x]
                    .Concat(new DataPoint(UnixTODateTime(x).ToOADate(), 0)).Take(1))
            .ToArray();
}

This is most performant way for your task (this is just a template, not the final code): 这是完成任务的最有效方式(这只是模板,而不是最终代码):

public void prepareData()
{
    // it will be initialized with null values
    var tempbuffer = new DataPoint[240];

    var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
    var oldest = timestamp - 240 + 1;

    // fill tempbuffer with existing DataPoints
    for (int i = 0; i < file.Length; i++)
    {
        if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
        {
            tempbuffer[file[i].XValue - oldest] = new DataPoint(file[i].XValue, file[i].YValues);
        }
    }

    // fill null values in tempbuffer with 'empty' DataPoints
    for (int i = 0; i < tempbuffer.Length; i++)
    {
        tempbuffer[i] = tempbuffer[i] ?? new DataPoint(oldest + i, 0);
    }
}

I have about ~10 ms 我大约有10毫秒

Update from comments: 评论更新:

If you want to fetch multiple DataPoint's and get the result using some function (eg average), then: 如果要获取多个DataPoint's并使用某些函数(例如平均值)获取结果,则:

public void prepareData()
{
    // use array of lists of YValues
    var tempbuffer = new List<double>[240];

    // initialize it
    for (int i = 0; i < tempbuffer.Length; i++)
    {
        tempbuffer[i] = new List<double>(); //set capacity for better performance
    }

    var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
    var oldest = timestamp - 240 + 1;

    // fill tempbuffer with existing DataPoint's YValues
    for (int i = 0; i < file.Length; i++)
    {
        if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
        {
            tempbuffer[file[i].XValue - oldest].Add(file[i].YValues);
        }
    }

    // get result
    var result = new DataPoint[tempbuffer.Length];
    for (int i = 0; i < result.Length; i++)
    {
        result[i] = new DataPoint(oldest + i, tempbuffer[i].Count == 0 ? 0 : tempbuffer[i].Average());
    }
}

If DataPoint is unique (no 2 instances with identical values) you should switch the list file to a dictionary. 如果DataPoint是唯一的(没有2个具有相同值的实例),则应将列表file切换为字典。 The dictionary lookup is way faster than iterating potentially all members of the array. 字典查找比迭代数组的所有成员快得多。

Of course you need to implement GetHashCode and Equals or define a unique key for each Datapoint . 当然,您需要实现GetHashCodeEquals或为每个Datapoint定义唯一的键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM