C ++中的快速百分位数

Question

我的程序为风险价值度量标准计算了蒙特卡洛模拟。 为了尽可能简化，我有：

1/ simulated daily cashflows
2/ to get a sample of a possible 1-year cashflow, 
   I need to draw 365 random daily cashflows and sum them

因此，根据经验，每日现金流量是分配给365次抽样的分配函数。 为此，我

 1/ sort the daily cashflows into an array called *this->distro*
 2/ calculate 365 percentiles corresponding to random probabilities

我需要对每年的现金流量进行模拟，例如，进行10K次模拟，才能获得大量的模拟年度现金流量。 准备好每日现金流量的分配函数后，我就可以像...

for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        prob = (FLT_TYPE)fastrand();         // prob [0,1]
        dIdx = prob * dMaxDistroIndex;       // scale prob to distro function size
                                             // to get an index into distro array
        _floor = ((FLT_TYPE)(long)dIdx);     // fast version of floor
        _ceil  = _floor + 1.0f;              // 'fast' ceil:)
        iIdx1  = (unsigned int)( _floor );
        iIdx2  = iIdx1 + 1;

        // interpolation per se
        generatedVal += this->distro[iIdx1]*(_ceil - dIdx  );
        generatedVal += this->distro[iIdx2]*(dIdx  - _floor);
    }
    this->yearlyCashflows[idxSim] = generatedVal ;
}

两个周期内的代码都for线性插值。 如果说1000美元对应于prob = 0.01，10000美元对应于prob = 0.1，那么如果我没有p = 0.05的经验值，我想通过插值获得5000美元。

问题是：此代码正确运行，尽管分析器说该程序本身在插值上花费了大约60％的运行时间。 所以我的问题是，我怎样才能使这项任务更快？ VTune报告的特定行的示例运行时如下：

prob = (FLT_TYPE)fastrand();         //  0.727s
dIdx = prob * dMaxDistroIndex;       //  1.435s
_floor = ((FLT_TYPE)(long)dIdx);     //  0.718s
_ceil  = _floor + 1.0f;              //    -

iIdx1  = (unsigned int)( _floor );   // 4.949s
iIdx2  = iIdx1 + 1;                  //    -

// interpolation per se
generatedVal += this->distro[iIdx1]*(_ceil - dIdx  );  //    -
generatedVal += this->distro[iIdx2]*(dIdx  - _floor);  // 12.704s

短划线表示分析器没有报告这些行的运行时。

任何提示将不胜感激。 丹尼尔

编辑： c.fogelklou和MSalters都指出了很大的增强。 符合c.fogelklou所说的最佳代码是

converter = distroDimension / (FLT_TYPE)(RAND_MAX + 1)
for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        dIdx  = (FLT_TYPE)fastrand() * converter;
        iIdx1 = (unsigned long)dIdx);
        _floor = (FLT_TYPE)iIdx1;
        generatedVal += this->distro[iIdx1] + this->diffs[iIdx1] *(dIdx  - _floor);
    }
}

我对MSalter的看法是

normalizer = 1.0/(FLT_TYPE)(RAND_MAX + 1);
for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        dIdx  = (FLT_TYPE)fastrand()* normalizer ;
        iIdx1 = fastrand() % _g.xDayCount;
        generatedVal += this->distro[iIdx1];
        generatedVal += this->diffs[iIdx1]*dIdx;
    }
}

第二个代码是大约。 快30％。 现在，在总运行时间的95s中，最后一行消耗了68s。 最后一行仅消耗3.2s，因此double * double乘法必须是魔鬼。 我想到了SSE-将最后三个操作数保存到数组中，然后对this-> diffs [i] * dIdx [i]进行矢量乘法，然后将其添加到this-> distro [i]中，但是这段代码运行了50％慢点。 因此，我想我碰壁了。

非常感谢所有人。 D.

Answer 1

这是一个小的优化建议，不需要ceil，两个强制转换和一个乘法。 如果您在定点处理器上运行，那可以解释为什么float和int之间的乘法和强制转换花了这么长时间。 在这种情况下，请尝试使用定点优化，或者在CPU支持的情况下在编译器中启用浮点！

for ( unsigned int idxSim = 0; idxSim < _g.xSimulationCount; idxSim++ )
{
    generatedVal = 0.0;
    for ( register unsigned int idxDay = 0; idxDay < 365; idxDay ++ )
    {
        prob = (FLT_TYPE)fastrand();         // prob [0,1]
        dIdx = prob * dMaxDistroIndex;       // scale prob to distro function size
                                             // to get an index into distro array
        iIdx1  = (long)dIdx;
        _floor = (FLT_TYPE)iIdx1;     // fast version of floor
        iIdx2  = iIdx1 + 1;

        // interpolation per se
        {
           const FLT_TYPE diff = this->distro[iIdx2] - this->distro[iIdx1];
           const FLT_TYPE interp = this->distro[iIdx1] + diff * (dIdx - _floor);
           generatedVal += interp;
        }
    }
    this->yearlyCashflows[idxSim] = generatedVal ;
}

Answer 2

我建议修复fastrand 。 浮点代码并不是世界上最快的，但是特别慢的是浮点代码和整数代码之间的切换。 由于需要整数索引，因此请使用整数随机函数。

在一个循环中预先生成所有365个随机值甚至可能是有利的。 由于每个值仅需要log2(dMaxDistroIndex)位随机性，因此您可以减少RNG调用的次数。

随后，您将为插值分数选择一个介于0和1之间的随机数。

C ++中的快速百分位数

问题描述

2 个解决方案

解决方案1
4 已采纳 2013-02-15 07:54:55

解决方案2
1 2013-02-15 10:30:16

C ++中的快速百分位数

问题描述

2 个解决方案

解决方案1 4 已采纳 2013-02-15 07:54:55

解决方案2 1 2013-02-15 10:30:16

解决方案1
4 已采纳 2013-02-15 07:54:55

解决方案2
1 2013-02-15 10:30:16