简体   繁体   中英

C# Increase Array Find Loop Performance

I've got an Datapoint[] file = new Datapoint[2592000] array. This array is filled with timestamps and random values. Creating them costs me like 2s. But in another function prepareData(); I'm preparing 240 Values for another Array TempBuffer . In the prepareData() function I'm searching for matching values in the file array. If I can't find any I take the timestamp and set the value to 0 else I'm taking the found value + same timestamp.

The function looks like this:

public void prepareData()
{  
    stopWatch.Reset();
    stopWatch.Start();
    Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

    for (double i = unixTimestamp; unixTimestamp - 240 < i; i--)
    {
        bool exists = true;

        if (exists != (Array.Exists(file, element => element.XValue == i)))
        {
            TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), 0) }).ToArray();
        }
        else
        {
            DataPoint point = Array.Find(file, element => element.XValue == i);
            TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), point.YValues) }).ToArray();
        }
    }

    stopWatch.Stop();
    TimeSpan ts = stopWatch.Elapsed;
}

Now the problem is with this amount of data in the file (2'592'000) the function needs like 40 seconds! With smaller amounts like 10'000 it's not problem and working fine and fast. But as soon as I set the file size to my prefered 2'592'000 points the CPU is pushed to 99% Performance and the function needs way too long.

TempBuffer Sample Value:
X = Converted UnixTimeStamp to DateTime and DateTime Converted To AODate
{X=43285.611087963, Y=23}

File Sample Value:
X = Unixtimestamp
{X=1530698090, Y=24}

It's important that the tempbuffer values are converted into AODate since the data inside the tempbuffer array is displayed in a mschart.

Is there a way to improve my code so I've got better performance?

Array.Exists() and Array.Find() are O(N) operations, you are performing them x M (240) times.

Try LINQ Join instead:

DataPoint[] dataPoints; // your "file" variable
var seekedTimestamps = Enumerable.Range(0, 240).Select(i => unixTimestamp - i);
var matchingDataPoints = dataPoints.Join(seekedTimestamps, dp => dp.XValue, sts => sts, (dp, sts) => dp);
var missingTimestamps = seekedTimestamps.Except(matchingDataPoints.Select(mdp => mdp.XValue));
// do your logic with found and missing here
// ...

LINQ Join uses hashing (on the selected "keys") and is close to O(n)

Alternatively, assuming Timestamps in input are unique and you plan to do multiple operations on the input, construct a Dictionary<int (Timestamp), DataPoint> (expensive), which will give you O(1) retrieval of a wanted data point: var dataPoint = dict[wantedTimestamp];

You haven't given us a complete picture of your code. I would ideally like sample data and the full class definitions. But given the limit information available I think you'll find something like this works:

public void prepareData()
{ 
    Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

    var map = file.ToLookup(x => x.XValue);

    TempBuffer =
        Enumerable
            .Range(0, 240)
            .Select(x => unixTimestamp - x)
            .SelectMany(x =>
                map[x]
                    .Concat(new DataPoint(UnixTODateTime(x).ToOADate(), 0)).Take(1))
            .ToArray();
}

This is most performant way for your task (this is just a template, not the final code):

public void prepareData()
{
    // it will be initialized with null values
    var tempbuffer = new DataPoint[240];

    var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
    var oldest = timestamp - 240 + 1;

    // fill tempbuffer with existing DataPoints
    for (int i = 0; i < file.Length; i++)
    {
        if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
        {
            tempbuffer[file[i].XValue - oldest] = new DataPoint(file[i].XValue, file[i].YValues);
        }
    }

    // fill null values in tempbuffer with 'empty' DataPoints
    for (int i = 0; i < tempbuffer.Length; i++)
    {
        tempbuffer[i] = tempbuffer[i] ?? new DataPoint(oldest + i, 0);
    }
}

I have about ~10 ms

Update from comments:

If you want to fetch multiple DataPoint's and get the result using some function (eg average), then:

public void prepareData()
{
    // use array of lists of YValues
    var tempbuffer = new List<double>[240];

    // initialize it
    for (int i = 0; i < tempbuffer.Length; i++)
    {
        tempbuffer[i] = new List<double>(); //set capacity for better performance
    }

    var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
    var oldest = timestamp - 240 + 1;

    // fill tempbuffer with existing DataPoint's YValues
    for (int i = 0; i < file.Length; i++)
    {
        if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
        {
            tempbuffer[file[i].XValue - oldest].Add(file[i].YValues);
        }
    }

    // get result
    var result = new DataPoint[tempbuffer.Length];
    for (int i = 0; i < result.Length; i++)
    {
        result[i] = new DataPoint(oldest + i, tempbuffer[i].Count == 0 ? 0 : tempbuffer[i].Average());
    }
}

If DataPoint is unique (no 2 instances with identical values) you should switch the list file to a dictionary. The dictionary lookup is way faster than iterating potentially all members of the array.

Of course you need to implement GetHashCode and Equals or define a unique key for each Datapoint .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM