简体   繁体   English

C ++“分段错误”或“free():无效指针”取决于输入(可重现)

[英]C++ "Segmentation fault" or "free(): invalid pointer" depending on input (reproducible)

This program gives a "Segmentation fault" or a "free(): invalid pointer" based on the input file used.该程序根据使用的输入文件给出“分段错误”或“free():无效指针”。 The bug is reproducible.该错误是可重现的。

This is very strange especially because free is not called.这很奇怪,尤其是因为没有调用free

Here is the complete program.这是完整的程序。

// Open a file, read line by line,
// and for each (loooong) line, fraction it into
// small, justified lines.
// Justification done by adding spaces to existing spaces.

#include <iostream>
#include <iterator>
#include <sstream>
#include <fstream>
#include <list>
#include <vector>
#include <math.h>
#include <random>
#include <functional>

const int pageWidth = 50; // max width for a (justified) line

typedef std::vector<std::string> WordList;
typedef std::vector<int> SpaceList;

// ========
// HELPERS
// ========

// Helper to return "cccccc...ccc"
std::string repeat (const int n, char c) {
    std::string ret;
    for (int i{0}; i < n; ++i) {
        ret += c;
    }
    return ret;
}

// "Random" int between min and max (pseudo-random: reproducible)
unsigned int random_pred(std::size_t salt, unsigned int min, unsigned int max) {
    unsigned int output = min + (salt % static_cast<int>(max - min + 1));
    return output;
}

// alpha is greater at center
float alpha_weight(float z) { return std::max(100*(sin(z))*(sin(z)), float(1)); }

// Weight of a space, ie probability to add here blank space
int weight(int x, unsigned int l)
{
    float z = 3.141 * x / l;
    return alpha_weight(z);
}

// line -> vector of words
WordList splitTextIntoWords( const std::string &text )
{
    WordList words;
    std::istringstream in(text);
    std::copy(std::istream_iterator<std::string>(in),
              std::istream_iterator<std::string>(),
              std::back_inserter(words));
    return words;
}

// ======
// CORE
// ======

// Give each space a weight, a 'probability' to be expanded
SpaceList charge_spaces ( int l, const WordList & words, SpaceList spp) 
{
    SpaceList sp_weights;
    std::string p{ words[0] }, h;
    int wg;
    for (size_t i = 0; i < words.size()-1; ++i) {
            wg = weight(spp[i], l);
            sp_weights.push_back(wg);
    }
    return sp_weights;
}

// Given weighted list of spaces positions, 'randomly' pick one
int random_sp( const SpaceList& spw, std::size_t salt ) {
    std::string m;
    unsigned int i{48}, total{0}; // ASCII 48 = ' '
    for (const int & n : spw) {
        char ch = static_cast<char>(i); // '1'; '2'; '3' ...
        std::string segment = repeat(n, ch); // "11"; "2222"; "3333333333333" ....
        m += segment;
        total += n;
        ++i;
    }  // now, m like "11112222222222333333333333333333334444444455555", of length <total>
    int mqrh = random_pred(salt, 0, total); // Get 0 <= random <= total
    char iss = m[mqrh]; // Read the char at this position (for example, more likely '3' here)
    int ret = static_cast<int>(iss) - 48; // Example: '3' -> 3
    return ret; // Example: return 3
}

// Add spaces to a (small) line, to justify it.
// We need to expand it by <excess> spaces.
std::string justifyLine( std::string line, WordList ww, SpaceList space_positions, int excess )
{
    SpaceList spwg = charge_spaces(line.size(), ww, space_positions);
    SpaceList spadd{spwg.begin(), spwg.end()}; // number of spaces to add after word <i>
    for (size_t k = 0; k < ww.size()-1; ++k) {
        spadd[k] = 1; // By default, 1 space after each word
    }
    int winner; // Which space will win additional space ?
    std::size_t str_hash = std::hash<std::string>{}(line) / 1000; // 'random' seed, reproducible
    for (int i{0}; i < excess; ++i) { // Where to add these <excess> needed spaces ?
        std::size_t salt = str_hash + 37*i;
        winner = random_sp(spwg, salt);
        spadd[winner] = spadd[winner] + 1; // space after <winner> word is incremented !
        spwg[winner] = std::max( spwg[winner] / 10, 1); // Weight of winner is decreased
    }
    // Build up the justified line
    std::string justified;
    for (size_t j = 0; j < ww.size()-1; ++j) {
        justified.append(ww[j]); // Add next word
        justified.append(spadd[j], ' '); // Add few spaces
    }
    justified.append(ww.back()); // Add last word
    std::cout << justified << std::endl;
    return justified;
}

// Fraction a long line in several justified small lines
void justifyText( const std::string& text )
{
    WordList words = splitTextIntoWords(text);
    std::string line;
    WordList ww;
    SpaceList space_positions;
    int position{0};
    int nwords_in_line{0};
    for (const std::string& word : words) {
        size_t s = word.size();
        if (line.size() + s + 1 <= pageWidth) { // next word fit into the line.
            if (!line.empty()) {
                line.append(" ");
                space_positions.push_back(position++);
            }
            line.append(word);
            nwords_in_line++;
            ww.push_back(word); // append this word to the list
            position += s;
        } else { // build a justified small line from the words added up
            justifyLine(line, ww, space_positions, pageWidth - position);
            line.clear(); // Cleaning for next chunk
            ww.clear();
            space_positions.clear();
            line = word;
            position = s;
            nwords_in_line = 1;
            ww.push_back(word); // don't forget the last word (that overflowed)
        }
    }
    std::cout << line << std::endl; // Remaining of the long line
}

// =====
// main
// =====
int main () {
    std::string line;
    std::ifstream myfile ("candle.txt");
    if (myfile.is_open())
    {
        while ( getline(myfile,line) )
        {
            justifyText(line);
        }
        myfile.close();
    }
    else std::cerr << "Unable to open file";
    return 0;
}

File "candle.txt" is an ASCII text file, here is a copy.文件“candle.txt”是一个 ASCII 文本文件, 这里是一个副本。

The whole file gives free(): invalid pointer , always at same position -- see below (1)整个文件给出free(): invalid pointer ,总是在同一个位置——见下文(1)

If cutting between the two markups in the PREFACE (deleting the chunk between the two CUT HERE marks), program gives a Segmentation fault .如果在 PREFACE 中的两个标记之间进行切割(删除两个CUT HERE标记之间的块),程序会给出Segmentation fault

Running with Valgrind gives this (very strange because repeat function does not seem problematic)使用 Valgrind 运行给出了这个(很奇怪,因为repeat功能似乎没有问题)

Thread 1: status = VgTs_Runnable (lwpid 4487)
==4487==    at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:342)
==4487==    by 0x4990859: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==4487==    by 0x4990F34: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator+=(char) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==4487==    by 0x10A406: repeat[abi:cxx11](int, char) (in /home/fig/Documents/cpp/cil)
==4487==    by 0x10A8DD: random_sp(std::vector<int, std::allocator<int> > const&, unsigned long) (in /home/fig/Documents/cpp/cil)
==4487==    by 0x10AB34: justifyLine(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<int, std::allocator<int> >, int) (in /home/fig/Documents/cpp/cil)
==4487==    by 0x10AF71: justifyText(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/fig/Documents/cpp/cil)
==4487==    by 0x10B195: main (in /home/xxx/Documents/cpp/cil)

Any idea welcome欢迎任何想法

(1) End of output: (1) 输出结束:

We have several tests for oxygen  besides the mere
burning of bodies. You have seen a candle burnt in
oxygen, or in the air;   you have  seen phosphorus
burnt in the air, or in oxygen; and  you have seen
iron-filings  burnt in  oxygen. But we  have other
tests besides  these, and  I am about  to refer to
one or two  of them for  he purpose of  carrying

The crash occurs as a result of a consequence earlier bug.崩溃是由于早期的错误导致的。 There's nothing (directly) wrong with any of the code in your backtrace.回溯中的任何代码都没有(直接)错误。 Although the bug effectively was traced to one of the function the actual crash was triggered from an earlier invocation of the same code, and the actual bug would be triggered later, after returning from the point of the crash.尽管错误有效地追溯到了其中一个函数,但实际崩溃是由先前调用相同代码触发的,并且实际错误将在从崩溃点返回后稍后触发。

One of my stock comments starts with: "Just because this is where the program crashes or reports an error doesn't mean this is where the problem is. C++ does not work this way."我的一个股票评论开头是:“仅仅因为这是程序崩溃或报告错误的地方并不意味着这就是问题所在。C++ 不能以这种方式工作。” This is followed by explanation that a minimal reproduce example must be provided, which you mostly did.然后解释说必须提供一个最小的重现示例,您通常这样做了。 This allowed the problem to be reproduced trivially.这样就可以轻松地重现问题。 The first instance of undefined behavior occured elsewhere, on line 109:未定义行为的第一个实例发生在第 109 行的其他地方:

        spadd[winner] = spadd[winner] + 1; 

valgrind has a very useful option: --vgdb-error=1 . valgrind 有一个非常有用的选项: --vgdb-error=1 This stops execution immediately when valgrind detects memory corruption, on this line, in this case.当 valgrind 在本例中检测到内存损坏时,这会立即停止执行。 valgrind then gives instruction for attaching to the current process, with gdb .然后 valgrind 给出使用gdb附加到当前进程的指令。 Doing so immediately led to the observation that this value of winner was -48.这样做立即导致观察到此winner的值为 -48。 Modifying spadd[-48] will only result in tears.修改spadd[-48]只会导致流泪。

At this point it wasn't too difficult to backtrack to line 91, where -48 came from, which led to the actual bug, an off by 1:在这一点上,回溯到第 91 行并不难,其中-48来自哪里,这导致了实际的错误,即减 1:

int mqrh = random_pred(salt, 0, total);

total here was always the same as m.size() , and this duplicate logic resulted in the bug, since this parameter should be the last value in the range for random_pred , and not one past it.这里的total始终与m.size()相同,并且这种重复的逻辑导致了错误,因为此参数应该是random_pred范围内的最后一个值,而不是过去的一个值。 The expected results here is to pick a random character from m , so the valid range is 0 to m.size()-1 .这里的预期结果是从m中选择一个随机字符,因此有效范围是0m.size()-1 If it wasn't for the duplicated logic, being mindful of how random_pred() 's parameters must be defined, the last parameter would've naturally be m.size()-1 .如果不是因为重复的逻辑,请注意必须如何定义random_pred()的参数,最后一个参数自然是m.size()-1 But the duplicated logic resulted in an indirect reference to the underlying value, total , and this detail was forgot.但是重复的逻辑导致间接引用了基础值total ,并且忘记了这个细节。

Another contributing factor to common kinds of bugs is going against the natural flow of how C++ defines ranges and sequences: not by the minimum and the maximum value, but the minimum and one past the maximum value.导致常见错误的另一个因素是违背 C++ 定义范围和序列的自然流程:不是按最小值和最大值,而是按最小值和超过最大值。 std::string::size() , std::vector::size , et. std::string::size()std::vector::size等。 al., is one past the last valid index of the underlying container, and not the last valid index of the container. al., 是基础容器的最后一个有效索引之后的一个,而不是容器的最后一个有效索引。 Similarly, end() , the ending iterator is not the iterator for the last value in the sequence, but the iterator to the next, non-existent value in the sequence, "one past it".同样, end() ,结束迭代器不是序列中最后一个值的迭代器,而是序列中下一个不存在的值的迭代器,“过去的一个”。

If random_pred was designed in harmony with the rest of the C++ library, its formula would simply involve min + salt % (max-min) instead of min + salt % (max-min+1) .如果random_pred的设计与 C++ 库的其余部分协调一致,则其公式将简单地涉及min + salt % (max-min)而不是min + salt % (max-min+1) Then it wouldn't matter if its third parameter was total or m.size() , it would've naturally worked either way.那么它的第三个参数是total还是m.size()关系,它自然会以任何一种方式工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM