Cpp：分段错误核心转储

Question

I am trying to write a lexer, when I try to copy isdigit buffer value in an array of char, I get this core dumped error although I have done the same thing with identifier without getting error.我正在尝试编写一个词法分析器，当我尝试将 isdigit 缓冲区值复制到一个 char 数组中时，我得到了这个核心转储错误，尽管我对标识符做了同样的事情而没有出错。

#include<fstream>
#include<iostream>
#include<cctype>
#include <cstring>
#include<typeinfo>

using namespace std;



int isKeyword(char buffer[]){
    char keywords[22][10] = {"break","case","char","const","continue","default", "switch",
                            "do","double","else","float","for","if","int","long","return","short",
                            "sizeof","struct","void","while","main"};
    int i, flag = 0;
    
    for(i = 0; i < 22; ++i){
        if(strcmp(keywords[i], buffer) == 0)
        {
            flag = 1;
            break;
        }
    }
    
    return flag;
}
int isSymbol_Punct(char word)
{
    int flag = 0;
    char symbols_punct[] = {'<','>','!','+','-','*','/','%','=',';','(',')','{', '}','.'};
    for(int x= 0; x< 15; ++x)
    {
        if(word==symbols_punct[x])
           {
               flag = 1;
               break;
           }
            
    }
    return flag;
}

int main()
{
    char buffer[15],buffer1[15];
    char identifier[30][10];
    char number[30][10];
    memset(&identifier[0], '\0', sizeof(identifier));
    memset(&number[0], '\0', sizeof(number));
    char word;
    ifstream fin("program.txt");
    if(!fin.is_open())
    {
        cout<<"Error while opening the file"<<endl;
    }
    int i,k,j,l=0;
    while (!fin.eof())
    {
        word  = fin.get();

        if(isSymbol_Punct(word)==1)
        {
            cout<<"<"<<word<<", Symbol/Punctuation>"<<endl;
        }
       
        if(isalpha(word))
        {        
            buffer[j++] = word;
            // cout<<"buffer: "<<buffer<<endl;
        }
        else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (j != 0))
        {
            buffer[j] = '\0';
            j = 0;
                            
            if(isKeyword(buffer) == 1)
                cout<<"<"<<buffer<<", keyword>"<<endl;
            else
                {
                cout<<"<"<<buffer<<", identifier>"<<endl;
                strcpy(identifier[i],buffer);
                i++;
                }
                    
        } 
           
        else if(isdigit(word))
        {
            buffer1[l++] = word;
            cout<<"buffer: "<<buffer1<<endl;
        }
        else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (l != 0))
        {
            buffer1[l] = '\0';
            l = 0;
            cout<<"<"<<buffer1<<", number>"<<endl;
            // cout << "Type is: "<<typeid(buffer1).name() << endl;
            strcpy(number[k],buffer1);
            k++;
                        
        } 

    }
    cout<<"Identifier Table"<<endl;
    int z=0;
    while(strcmp(identifier[z],"\0")!=0) 
    {       
        cout <<z<<"\t\t"<< identifier[z]<<endl; 
        z++;
        
    }  
    // cout<<"Number Table"<<endl;
    // int y=0;
    // while(strcmp(number[y],"\0")!=0) 
    // {       
    //     cout <<y<<"\t\t"<< number[y]<<endl; 
    //     y++;
        
    // }   

    
}

I am getting this error when I copy buffer1 in number[k] using strcpy.当我使用 strcpy 在 number[k] 中复制 buffer1 时出现此错误。 I do not understand why it is not being copied.我不明白为什么它没有被复制。 When i printed the type of buffer1 to see if strcpy is not generating error, I got A_15, I searched for it, but did not find any relevant information.当我打印 buffer1 的类型以查看 strcpy 是否没有产生错误时，我得到了 A_15，我搜索它，但没有找到任何相关信息。

Answer 1

The reason is here (line 56):原因在这里（第 56 行）：

int i,k,j,l=0;

You might think that this initializes i , j , k , and l to 0 , but in fact it only initializes l to 0 .您可能认为这会将i 、 j 、 k和l初始化为0 ，但实际上它只是将l初始化为0 。 i , j , and k are declared here, but not initialized to anything. i ， j和k在这里声明，但没有初始化为任何东西。 As a result, they contain random garbage, so if you use them as array indices you are likely to end up overshooting the bounds of the array in question.结果，它们包含随机垃圾，因此如果您将它们用作数组索引，您最终可能会超出相关数组的边界。

At that point, anything could happen—in other words, this is undefined behavior .到那时，任何事情都可能发生——换句话说，这是未定义的行为。 One likely outcome, which is probably happening to you, is that your program tries to access memory that hasn't been assigned to it by the operating system, at which point it crashes (a segmentation fault ).一个可能发生在您身上的可能结果是，您的程序尝试访问尚未由操作系统分配给它的 memory，此时它崩溃（分段错误）。

To give a concrete demonstration of what I mean, consider the following program:为了具体说明我的意思，请考虑以下程序：

#include <iostream>

void print_var(std::string name, int v)
{
    std::cout << name << ": " << v << "\n";
}

int main(void)
{
    int i, j, k, l = 0;

    print_var("i", i);
    print_var("j", j);
    print_var("k", k);
    print_var("l", l);

    return 0;
}

When I ran this, I got the following:当我运行它时，我得到以下信息：

i: 32765
j: -113535829
k: 21934
l: 0

As you can see, i , j , and k all came out such that using them as indices into any of the arrays you declared would exceed their bounds.如您所见， i 、 j和k都出现了，因此将它们用作您声明的任何 arrays 的索引将超出它们的范围。 Unless you are very lucky, this will happen to you, too.除非你很幸运，否则这也会发生在你身上。

You can fix this by initializing each variable separately:您可以通过分别初始化每个变量来解决此问题：

int i = 0;
int j = 0;
int k = 0;
int l = 0;

Initializing each on its own line makes the initializations easier to see, helping to prevent mistakes.在自己的行上初始化每个使初始化更容易看到，有助于防止错误。

A few side notes:一些旁注：

I was able to spot this issue immediately because I have my development environment configured to flag lines that provoke compiler warnings.我能够立即发现这个问题，因为我将开发环境配置为标记引起编译器警告的行。 Using a variable before it's being initialized should provoke such a warning if you're using a reasonable compiler, so you can fix problems like this as you run into them.如果您使用的是合理的编译器，那么在初始化之前使用变量应该会引发这样的警告，因此您可以在遇到此类问题时解决此类问题。 Your development environment may support the same feature (and if it doesn't, you might consider switching to something that does).您的开发环境可能支持相同的功能（如果不支持，您可能会考虑切换到支持的功能）。 If nothing else, you can turn on warnings during compilation (by passing -Wall -Wextra to your compiler or the like—check its documentation for the specifics).如果不出意外，您可以在编译期间打开警告（通过将-Wall -Wextra传递给您的编译器等 - 检查其文档以获取详细信息）。
Since you declared your indices as int , they are signed integers, which means they can hold negative values (as j did in my demonstration).由于您将索引声明为int ，因此它们是有符号整数，这意味着它们可以保存负值（就像j在我的演示中所做的那样）。 If you try to index into an array using a negative index, you will end up dereferencing a pointer to a location "behind" the start of the array in memory, so you will be in trouble even with an index of -1 (remember that a C-style array is basically just a pointer to the start of the array).如果您尝试使用负索引对数组进行索引，您最终将取消引用指向 memory 中数组开头“后面”的位置的指针，因此即使索引为-1 ，您也会遇到麻烦（请记住C 风格的数组基本上只是指向数组开头的指针）。 Also, int probably has only 32 bits in your environment, so if you're writing 64-bit code then it's possible to define arrays too large for an int to fully cover, even if you were to index into the array from the middle.此外， int在您的环境中可能只有 32 位，因此如果您正在编写 64 位代码，则可以定义 arrays 太大而无法完全覆盖int ，即使您要从中间索引到数组。 For these sorts of reasons, it's generally a good idea to type raw array indices as std::size_t , which is always capable of representing the size of the largest possible array in your target environment, and also is unsigned.由于这些原因，将原始数组索引键入为std::size_t通常是一个好主意，它始终能够表示目标环境中可能的最大数组的大小，并且也是无符号的。
You describe this as C++ code, but I don't see much C++ here aside from the I/O streams.您将其描述为 C++ 代码，但除了 I/O 流之外，我在这里看不到太多 C++。 C++ has a lot of amenities that can help you guard against bugs compared to C-style code (which has to be written with great care). C++ 与 C 风格的代码（必须非常小心地编写）相比，有很多便利可以帮助您防止错误。 For example, you could replace your C-style arrays here with instances of std::array , which has a member function at() that does subscripting with bounds checking;例如，您可以在这里用std::array的实例替换您的 C 样式 arrays ，它有一个成员 function at() ，它通过边界检查进行下标； that would have thrown a helpful exception in this case instead of having your program segfault.在这种情况下，这会引发一个有用的异常，而不是让您的程序出现段错误。 Also, it doesn't seem like you have a particular need for fixed-size arrays in this case, so you may better off using std::vector ;此外，在这种情况下，您似乎并不特别需要固定大小的 arrays ，因此您最好使用std::vector ； this will automatically grow to accommodate new elements, helping you avoid writing outside the vector's bounds.这将自动增长以适应新元素，帮助您避免超出向量范围的写入。 Both support range-based for loops , which save you from needing to deal with indices by hand at all.两者都支持基于范围的 for 循环，这使您无需手动处理索引。 You might enjoy Bjarne's A Tour of C++ , which gives a nice overview of idiomatic C++ and will make all the wooly reference material easier to parse.您可能会喜欢 Bjarne 的A Tour of C++ ，它对惯用的 C++ 进行了很好的概述，并使所有毛茸茸的参考资料更容易解析。 (And if you want to pick up some nice C habits, both K&R and Kernighan and Pike's The Practice of Programming can save you much pain and tears). （如果你想养成一些好的 C 习惯， K&R和 Kernighan 以及 Pike 的The Practice of Programming都可以为你省去很多痛苦和眼泪）。

Answer 2

Some general hints that might help you to avoid your cause of crash totally by design:一些一般提示可能会帮助您完全通过设计避免崩溃原因：

As this is C++, you should really refer to established C++ data types and schemes here as far as possible.由于这里是C++，所以你真的应该尽可能参考这里建立的C++数据类型和方案。 I know, that distinct stuff in terms of parser/lexer writing can become quite low-level but at least for the things you want to achieve here, you should really appreciate that.我知道，就解析器/词法分析器编写而言，这些不同的东西可能会变得非常低级，但至少对于你想要在这里实现的东西，你应该真的很感激。 Avoid plain arrays as far as possible.尽可能避免使用普通的 arrays。 Use std::vector of uint8_t and/or std::string for instance.例如，使用 uint8_t 的 std::vector 和/或 std::string。
Similar to point 1 and a consequence: Always use checked bounds iterations, You don't need to try to be better than the optimizer of your compiler, at least not here.类似于第 1 点和一个结果：始终使用检查边界迭代，您不需要尝试比编译器的优化器更好，至少在这里不是。 In general, one should always avoid to duplicate container size information.通常，应始终避免重复容器大小信息。 With the stated C++ containers?使用所述 C++ 容器？ this information is always provided on data source side already, If not possible for very rare cases (,).此信息始终已在数据源端提供，如果在极少数情况下不可能（，）。 use constants for that, directly declared at/within data source definition/initialization.为此使用常量，直接在数据源定义/初始化处/内部声明。
Give your variables meaningful names, declare them as local to their used places as possible.给你的变量起有意义的名字，尽可能将它们声明为它们使用过的地方的本地变量。
isXXX-methods - at least your ones, should return boolean values. isXXX 方法 - 至少您的方法应该返回 boolean 值。 You never return something else than 0 or 1.你永远不会返回 0 或 1 以外的东西。
A personal recommendation that is a bit controversional to be a general rule: Use early returns and abort criteria, Even after the check for file reading issues.一个有点争议的个人建议是一般规则：使用提前返回和中止标准，即使在检查文件读取问题之后也是如此。 you proceed further.你继续前进。
Try to keep your functions smart and non-boilerplate!尽量保持你的功能智能和非样板！ Use sub-routines for distinct sub-tasks!为不同的子任务使用子程序！
Try to avoid using namespace that globally, Even without exotic building schemes like UnityBuilds.尽量避免使用全局命名空间，即使没有像 UnityBuilds 这样的奇异构建方案。 this can become error-prone as hell for huger projects at latest.最迟对于更大的项目来说，这很容易出错。
the arrays keywords and symbols_punct should be at least static const ones. arrays 关键字和 symbols_punct 至少应为 static 常量。 The optimizer will easily be able to recognize that but it's rather a help for you for fast code understanding at least.优化器将很容易识别出这一点，但它至少有助于您快速理解代码。 Try to use classes here to compound the things that belong together in a readable, adaptive, easy modifiable and reusable way.尝试在这里使用类以可读、自适应、易于修改和可重用的方式组合属于一起的事物。 Always keep in mind, that you might want to understand your own code some months later still, maybe even other developers.永远记住，几个月后你可能还想了解自己的代码，甚至可能是其他开发人员。

Cpp：分段错误核心转储

问题描述

2 个解决方案

解决方案1
1 2020-12-12 09:56:22

解决方案2
0 2020-12-08 21:28:28

Cpp：分段错误核心转储

问题描述

2 个解决方案

解决方案1 1 2020-12-12 09:56:22

解决方案2 0 2020-12-08 21:28:28

解决方案1
1 2020-12-12 09:56:22

解决方案2
0 2020-12-08 21:28:28