C＃提高數組查找循環性能

Question

我有一個Datapoint[] file = new Datapoint[2592000]數組。 該數組填充有時間戳和隨機值。 創建它們花了我2秒鍾的時間。 但是在另一個函數prepareData(); 我正在為另一個數組TempBuffer准備240個值。 在prepareData()函數中，我正在搜索file數組中的匹配值。 如果找不到，我將時間戳記並將其設置為0，否則我將獲取發現值+相同的時間戳記。

該函數如下所示：

public void prepareData()
{  
    stopWatch.Reset();
    stopWatch.Start();
    Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

    for (double i = unixTimestamp; unixTimestamp - 240 < i; i--)
    {
        bool exists = true;

        if (exists != (Array.Exists(file, element => element.XValue == i)))
        {
            TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), 0) }).ToArray();
        }
        else
        {
            DataPoint point = Array.Find(file, element => element.XValue == i);
            TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), point.YValues) }).ToArray();
        }
    }

    stopWatch.Stop();
    TimeSpan ts = stopWatch.Elapsed;
}

現在的問題是file的數據量很大（2'592'000），該功能需要40秒鍾！ 使用較小的數量（例如1萬），這不是問題，並且可以正常且快速地工作。 但是，一旦我將file大小設置為我喜歡的2'592'000點，CPU就被推到了99％的性能，並且該功能需要太長時間。

TempBuffer樣本值：
X =將UnixTimeStamp轉換為DateTime，並將DateTime轉換為AODate
{X = 43285.611087963，Y = 23}

文件樣本值：
X = Unix時間戳
{X = 1530698090，Y = 24}

將臨時緩沖區值轉換為AODate非常重要，因為臨時緩沖區數組中的數據以mschart顯示。

有沒有一種方法可以改善我的代碼，以便獲得更好的性能？

Answer 1

Array.Exists（）和Array.Find（）是O（N）個操作，您要執行它們x M（240）次。

嘗試使用LINQ Join：

DataPoint[] dataPoints; // your "file" variable
var seekedTimestamps = Enumerable.Range(0, 240).Select(i => unixTimestamp - i);
var matchingDataPoints = dataPoints.Join(seekedTimestamps, dp => dp.XValue, sts => sts, (dp, sts) => dp);
var missingTimestamps = seekedTimestamps.Except(matchingDataPoints.Select(mdp => mdp.XValue));
// do your logic with found and missing here
// ...

LINQ Join使用散列（在選定的“鍵”上）並且接近O（n）

另外，假設輸入中的時間戳是唯一的，並且您打算對輸入進行多次操作，則構造一個Dictionary<int (Timestamp), DataPoint> （expensive），這將為您提供O（1）檢索所需數據點的方法： var dataPoint = dict[wantedTimestamp];

Answer 2

您尚未提供完整的代碼圖。 理想情況下，我希望獲得示例數據和完整的類定義。 但是，鑒於可用的限制信息，我想您會發現類似的作品：

public void prepareData()
{ 
    Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

    var map = file.ToLookup(x => x.XValue);

    TempBuffer =
        Enumerable
            .Range(0, 240)
            .Select(x => unixTimestamp - x)
            .SelectMany(x =>
                map[x]
                    .Concat(new DataPoint(UnixTODateTime(x).ToOADate(), 0)).Take(1))
            .ToArray();
}

Answer 3

這是完成任務的最有效方式（這只是模板，而不是最終代碼）：

public void prepareData()
{
    // it will be initialized with null values
    var tempbuffer = new DataPoint[240];

    var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
    var oldest = timestamp - 240 + 1;

    // fill tempbuffer with existing DataPoints
    for (int i = 0; i < file.Length; i++)
    {
        if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
        {
            tempbuffer[file[i].XValue - oldest] = new DataPoint(file[i].XValue, file[i].YValues);
        }
    }

    // fill null values in tempbuffer with 'empty' DataPoints
    for (int i = 0; i < tempbuffer.Length; i++)
    {
        tempbuffer[i] = tempbuffer[i] ?? new DataPoint(oldest + i, 0);
    }
}

我大約有10毫秒

評論更新：

如果要獲取多個DataPoint's並使用某些函數（例如平均值）獲取結果，則：

public void prepareData()
{
    // use array of lists of YValues
    var tempbuffer = new List<double>[240];

    // initialize it
    for (int i = 0; i < tempbuffer.Length; i++)
    {
        tempbuffer[i] = new List<double>(); //set capacity for better performance
    }

    var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
    var oldest = timestamp - 240 + 1;

    // fill tempbuffer with existing DataPoint's YValues
    for (int i = 0; i < file.Length; i++)
    {
        if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
        {
            tempbuffer[file[i].XValue - oldest].Add(file[i].YValues);
        }
    }

    // get result
    var result = new DataPoint[tempbuffer.Length];
    for (int i = 0; i < result.Length; i++)
    {
        result[i] = new DataPoint(oldest + i, tempbuffer[i].Count == 0 ? 0 : tempbuffer[i].Average());
    }
}

Answer 4

如果DataPoint是唯一的（沒有2個具有相同值的實例），則應將列表file切換為字典。 字典查找比迭代數組的所有成員快得多。

當然，您需要實現GetHashCode和Equals或為每個Datapoint定義唯一的鍵。

C＃提高數組查找循環性能

問題描述

4 個解決方案

解決方案1
1 2018-07-04 10:11:32

解決方案2
1 2018-07-04 10:35:30

解決方案3
1 已采納 2018-07-04 11:15:58

解決方案4
0 2018-07-04 10:03:50

C＃提高數組查找循環性能

問題描述

4 個解決方案

解決方案1 1 2018-07-04 10:11:32

解決方案2 1 2018-07-04 10:35:30

解決方案3 1 已采納 2018-07-04 11:15:58

解決方案4 0 2018-07-04 10:03:50

解決方案1
1 2018-07-04 10:11:32

解決方案2
1 2018-07-04 10:35:30

解決方案3
1 已采納 2018-07-04 11:15:58

解決方案4
0 2018-07-04 10:03:50