简体   繁体   English

如何在 C++ 中找到数据集第一列的局部最大值

[英]How to find local maxima of the first column of dataset in C++

Here is the code with which I store the.txt file:这是我存储 .txt 文件的代码:

ifstream f("file.txt");
string str1;

if (f.is_open())
{
getline(f,str1);
while(f)
{
    cout << str1 << endl;
    getline(f, str1);
}
f.close();
}
}

The problem is, that the str1[i] access the i-th symbol of the whole dataset.问题是,str1[i] 访问整个数据集的第 i 个符号。 I'd like to find all local maxima of the second column of the dataset.我想找到数据集第二列的所有局部最大值。 Here is the example of the dataset:这是数据集的示例:

15497.97740 -0.174807
15497.99247 0.410084
15498.00754 0.680590
15498.02260 -0.887408
15498.03767 -1.383546
15498.05273 -0.741141

One of the ways that you can do this is to load the second column into a vector and then find maximum element in that vector.您可以执行此操作的方法之一是将第二列加载到向量中,然后找到该向量中的最大元素。 You can read your file either by lines or by individual numbers using std::fstream s operator>>(double) .您可以使用std::fstream s operator>>(double)按行或单个数字读取文件。 The second approach seems simpler in this case.在这种情况下,第二种方法似乎更简单。

Notice that you don't need to manually close the file since the file is closed automatically in std::fstream s destructor.请注意,您不需要手动关闭文件,因为文件在std::fstream的析构函数中自动关闭。

#include <algorithm>
#include <iostream>
#include <fstream>
#include <vector>

int main()
{
    std::fstream ifs("data.txt");
    if (!ifs.is_open())
    {
        return 1;
    }

    std::vector<double> secondColumn;

    // read the file skipping the first column
    double d1;
    double d2;
    while (ifs >> d1 && ifs >> d2)
    {
        secondColumn.push_back(d2);
    }

    // use the algorithm library to find the max element
    // std::max_element returns end iterator if the vector is empty
    // so an additional check is needed 
    auto maximumIt = std::max_element(secondColumn.begin(), secondColumn.end());
    if (maximumIt != secondColumn.end())
    {
        double maximum = *maximumIt;
        std::cout << maximum << '\n';
    }
}

I am sorry to say, but your question is not fully clear to me.我很抱歉地说,但你的问题对我来说并不完全清楚。 Sorry for that.对此感到抱歉。

Anyway, I will try to help.无论如何,我会尽力提供帮助。 I will find ALL local maxima.我会找到所有局部最大值。

We will split down the big problems into small problems, with classes and methods.我们将使用类和方法将大问题分解为小问题。 That is then easier to solve.这样就更容易解决了。

Let's start with the basic element.让我们从基本元素开始。 A point on a curve.曲线上的一个点。 We will create a mini class, containing an “x” and “y”, assuming that this is your column 1 and column 2. We will add very easy input end output functions.我们将创建一个迷你 class,包含“x”和“y”,假设这是您的第 1 列和第 2 列。我们将添加非常简单的输入端 output 函数。

// One point. Has a X and a Y coordinate. Can be easily read and written
struct Point {
    // Data
    double x{};
    double y{};

    // Ultra simple input and output function
    friend std::istream& operator >> (std::istream& is, Point& p) { return is >> p.x >> p.y; }
    friend std::ostream& operator << (std::ostream& os, const Point& p) { return os << std::setprecision(10) << p.x << " \t " << p.y; }
};

Next.下一个。 This is a curve.这是一条曲线。 It simply consists of many points.它只是由许多点组成。 We will use a std::vector to store the Point elements, because the std::vector can grow dynamically.我们将使用std::vector来存储 Point 元素,因为std::vector可以动态增长。 We will add also here very simple input and output functions.我们还将在这里添加非常简单的输入和 output 函数。 For the input, we read Points in a loop and add them to our internal std::vector .对于输入,我们在循环中读取 Points 并将它们添加到我们的内部std::vector中。 The output will simply write all values of our std::vector to the output stream “os”. output 将简单地将我们的std::vector的所有值写入 output stream “os”。

Next, reading the data from a file.接下来,从文件中读取数据。 Because we already defined the input and outpit operators for a Point and a Curve base on a stream, we can simply use the standard extractor << and inserter >> operators.因为我们已经基于 stream 为 Point 和 Curve 定义了输入和输出操作符,所以我们可以简单地使用标准提取器<<和插入器>>操作符。

The first approach will then look like that:第一种方法将如下所示:

int main() {
    // Open the sourcefile with the curve data
    std::ifstream sourceFileStream{"r:\\file.txt"};

    // Check, if we could open the file
    if (sourceFileStream) {

        // Here, we will store our curve data
        Curve curve{};

        // Now read all all points and store them as a curve
        sourceFileStream >> curve;

        // Show debug output
        std::cout << curve;
    }
    else std::cerr << "\n*** Error: Could not open source file\n";
}

Hm, looks really cool and simple.嗯,看起来真的很酷很简单。 But, how does it work?但是,它是如何工作的? First, we open the file with the constructor of the std::ifstream .首先,我们使用std::ifstream的构造函数打开文件。 That is easy.这很容易。 And the nice thing is, the destructor of the std::ifstream will close the file for us automatically.好消息是, std::ifstream的析构函数会自动为我们关闭文件。 This happens on the next closing brace } .这发生在下一个右大括号}上。

Tio check, if the stream is still OK or has a failure, we can simply write if (sourceFileStream) . Tio 检查,如果 stream 仍然正常或有故障,我们可以简单地写if (sourceFileStream) This is possible, because the ``std::ifstream 's bool operator is overwritten. And since the这是可能的,因为 ``std::ifstream 's布尔operator is overwritten. And since the operator is overwritten. And since the if` statement expects a Boolean value, this operator is called and informs us, if there is a problem or not. operator is overwritten. And since the if` 语句需要一个 Boolean 值,因此调用此运算符并通知我们是否有问题。 True means no problem. True 表示没有问题。 Nice.好的。

Now, lets come to the local peak value search.现在,让我们来进行局部峰值搜索。 The problem is often a discrete signal with overlayed noise.问题通常是带有叠加噪声的离散信号。 Let us look at the following example with a base sinusoid curve and some heavy noise:让我们看一下以下示例,其中包含基本正弦曲线和一些重噪声:

在此处输入图像描述

We will add 2 thresholds.我们将添加 2 个阈值。 An upper and a lower one, or just an upper one but with a negative hysteresis.一个上限和一个下限,或者只是一个上限,但具有负滞后。 Sounds complicated, but is not.听起来很复杂,其实不然。 First, we will check the absolute maximum and absolute minimum value of the curve.首先,我们将检查曲线的绝对最大值和绝对最小值。 Based on that we will calculate the thresholds as percentage values.基于此,我们将阈值计算为百分比值。

We will evaluate value by value and if we pass the upper threshold, we will start looking for a maximum.我们将逐个评估值,如果我们通过上限阈值,我们将开始寻找最大值。 We will do this until we cross the lower threshold.我们将这样做,直到我们越过较低的门槛。 At this moment, we will store the so far calculated max value (together with its x value).此时,我们将存储到目前为止计算的最大值(连同它的 x 值)。 Then, we wait until we cross again the upper threshold.然后,我们等到再次越过上限。 The hysteresis will prevent continuous toggling of the search mode in case of noise.滞后将防止在有噪声的情况下连续切换搜索模式。

All this put in code could look like that:所有这些放入代码中的内容可能如下所示:

std::vector<Point> Curve::findPeaks() {

    // Definition of Threshold value and hysteresis to find max peak values
    constexpr double ThreshholdPercentageHigh{ 0.7 };
    constexpr double Hyteresis{ 0.2 };
    constexpr double ThreshholdPercentageLow{ ThreshholdPercentageHigh - Hyteresis };

    // First find the overall min / max to calculate some threshold data
    const auto [min, max] = std::minmax_element(points.cbegin(), points.cend(), [](const Point& p1, const Point& p2) { return p1.y < p2.y; });
    const double thresholdMaxHigh = ((max->y - min->y) * ThreshholdPercentageHigh + min->y);
    const double thresholdMaxLow = ((max->y - min->y) * ThreshholdPercentageLow + min->y);


    // We need to know, if the search is active
    // And we need to know, when there is a transition from active to inactive
    bool searchActive{};
    bool oldSearchActive{};

    // Initiliaze with lowest possible value, so that any other value will be bigger
    double maxPeakY{ std::numeric_limits<double>::min() };
    // X value for the max peak value
    double maxPeakX{ std::numeric_limits<double>::min() };

    std::vector<Point> peaks{};

    // Go through all values
    for (size_t index{}; index < points.size(); ++index) {

        // Check,if values are above threshold, then switch on search mode
        if (not searchActive) {
            if (points[index].y > thresholdMaxHigh)
                searchActive = true;
        }
        else {
            // Else, if value is lower than lower threshold, then switch of search mode formaxpeak
            if (points[index].y < thresholdMaxLow)
                searchActive = false;
        }
        // If the search is active, then find the max peak
        if (searchActive)
            if (points[index].y > maxPeakY) {
                maxPeakX = points[index].x;
                maxPeakY = points[index].y;
            }
        // Check for a transition from active to inactive. In that very moment, store the previosuly found values
        if (not searchActive and oldSearchActive) {
            peaks.push_back({ maxPeakX, maxPeakY });
            maxPeakY = std::numeric_limits<double>::min();
        }
        // Remember for next round
        oldSearchActive = searchActive;
        searchActive = points[index].y > thresholdMaxHigh;
    }
    return peaks;
}

Leading to a final solution with everything put together:将所有内容放在一起导致最终解决方案:

#include <iostream>
#include <fstream>
#include <vector>
#include <iomanip>
#include <algorithm>

// One point. Has a X and a Y coordinate. Can be easily read and written
struct Point {
    // Data
    double x{};
    double y{};

    // Ultra simple input and output function
    friend std::istream& operator >> (std::istream& is, Point& p) { return is >> p.x >> p.y; }
    friend std::ostream& operator << (std::ostream& os, const Point& p) { return os << std::setprecision(10) << p.x << " \t " << p.y; }
};

// A curve consists of many pointes
struct Curve {
    // Data
    std::vector<Point> points{};

    // find peaks
    std::vector<Point> findPeaks();

    // Ultra simple input and output function
    friend std::istream& operator >> (std::istream& is, Curve& c) { Point p{};  c.points.clear();  while (is >> p) c.points.push_back(p);  return is; }
    friend std::ostream& operator << (std::ostream& os, const Curve& c) { for (const Point& p : c.points) os << p << '\n'; return os; }
};

std::vector<Point> Curve::findPeaks() {

    // Definition of Threshold value and hysteresis to find max peak values
    constexpr double ThreshholdPercentageHigh{ 0.7 };
    constexpr double Hyteresis{ 0.2 };
    constexpr double ThreshholdPercentageLow{ ThreshholdPercentageHigh - Hyteresis };

    // First find the overall min / max to calculate some threshold data
    const auto [min, max] = std::minmax_element(points.cbegin(), points.cend(), [](const Point& p1, const Point& p2) { return p1.y < p2.y; });
    const double thresholdMaxHigh = ((max->y - min->y) * ThreshholdPercentageHigh + min->y);
    const double thresholdMaxLow = ((max->y - min->y) * ThreshholdPercentageLow + min->y);


    // We need to know, if the search is active
    // And we need to know, when there is a transition from active to inactive
    bool searchActive{};
    bool oldSearchActive{};

    // Initiliaze with lowest possible value, so that any other value will be bigger
    double maxPeakY{ std::numeric_limits<double>::min() };
    // X value for the max peak value
    double maxPeakX{ std::numeric_limits<double>::min() };

    std::vector<Point> peaks{};

    // Go through all values
    for (size_t index{}; index < points.size(); ++index) {

        // Check,if values are above threshold, then switch on search mode
        if (not searchActive) {
            if (points[index].y > thresholdMaxHigh)
                searchActive = true;
        }
        else {
            // Else, if value is lower than lower threshold, then switch of search mode formaxpeak
            if (points[index].y < thresholdMaxLow)
                searchActive = false;
        }
        // If the search is active, then find the max peak
        if (searchActive)
            if (points[index].y > maxPeakY) {
                maxPeakX = points[index].x;
                maxPeakY = points[index].y;
            }
        // Check for a transition from active to inactive. In that very moment, store the previosuly found values
        if (not searchActive and oldSearchActive) {
            peaks.push_back({ maxPeakX, maxPeakY });
            maxPeakY = std::numeric_limits<double>::min();
        }
        // Remember for next round
        oldSearchActive = searchActive;
        searchActive = points[index].y > thresholdMaxHigh;
    }
    return peaks;
}


int main() {
    // Open the sourcefile with the curve data
    std::ifstream sourceFileStream{"file.txt"};

    // Check, if we could open the file
    if (sourceFileStream) {

        // Here, we will store our curve data
        Curve curve{};

        // Now read all all points and store them as a curve
        sourceFileStream >> curve;

        // Show peaks output
        for (const Point& p : curve.findPeaks()) std::cout << p << '\n';
    }
    else std::cerr << "\n*** Error: Could not open source file\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM