简体   繁体   中英

C program compiled with cygwin in Windows works, segmentation fault under Linux. Is cygwin GCC 'bad'?

For my Programming 102 class we are asked to deliver C code that compiles and runs under Linux. I don't have enough spare space on my hard drive to install Linux alongside Windows, and so I use cygwin to compile my programs.

The most recent program I had to give in compiles and runs fine under cygwin. It compiles fine under Linux, but half-way through execution produces a segmentation fault. I explained this to the grad student who gives us class and he said that cygwin's version of GCC allows for sloppier code to be compiled and executed.

The few references I have found via google haven't been conclusive. One thread I found said that the cause for the seg fault under Linux is a memory leak. Why would this not affect the cygwin version?

I would use the University's computers, but I can't use Subversion on them which would significantly hinder my efforts. (I'm new to coding and often need to be able to be able to revert to X revisions ago).

Is cygwin's version of GCC really more 'lax' with the code it compiles? If so, are there any obvious issues to look out for when coding? Are there any alternatives for being able to write code that will run under Linux?

Edit

Thanks for the replies. I wasn't explicit enough in my original post: that there is a bug in my code was pretty much a given for me (I am quite new to programming, and really green when it comes to C, after all). My TA implied cygwin's GCC is a less reliable compiler -allowing for much sloppier code to run- than the one found under GNU/Linux. I found this strange and so had a search on the internet, but couldn't really find any references to that fact.

More than blaming the compiler vs. my code, I was wondering what the reason could be for the program to run under Windows and crash under Linux. The replies re: different memory managers and heap/stack layout under Windows/Linux were illustrating in that regard.

Would the conclusion that cygwin's GCC is just as 'good' as GNU/Linux', and it's the underlying operating systems/sheer luck that my buggy program runs under one and not the other be pretty much correct?

Regarding posting the source code, it's a homework assignment so I'd prefer to find the issue myself if at all possible :)

Edit 2

I've accepted jalf's answer as it talks about what makes the program run under Windows and not under Linux, which was what I really wanted to know. Thanks to everyone else who contributed, they were all very interesting and informative replies.

When I've found the issue and fixed it I'll upload a zip file with all the source code of this non-working version, in case anyone is curious to see what the hell I did :)

Edit 3

For those interested in seeing the code, I found the problem, and it was indeed due to pointers. I was trying to return a pointer from a function. The pointer I was trying to return was being declared inside the function and so was being destroyed once the function executed. Problematic code is commented out on lines 22-24.

Feel free to ridicule my code.

/**
*  Returns array of valid searches based on current coordinate
*/
void determine_searches(int row, int col, int last_row, int last_col, int *active_search){
    // define coordinate categories and related valid search directions
    int Library0[] = {2, 3, 4, -1};
    int Library1[] = {4, 5, 6, -1};
    int Library2[] = {2, 3, 4, 5, 6, -1};
    int Library3[] = {0, 1, 2, 3, 4, 5, 6, 7, -1};
    int Library4[] = {0, 1, 2, -1};
    int Library5[] = {0, 6, 7, -1};
    int Library6[] = {0, 1, 2, 6, 7, -1};
    int Library7[] = {0, 1, 2, 3, 4, -1};
    int Library8[] = {0, 4, 5, 6, 7, -1};

    int * Library[] = { 
        Library0, Library1, Library2,
        Library3, Library4, Library5,
        Library6, Library7, Library8,
    };

    // declare (and assign memory to) the array of valid search directions that will be returned
    //int *active_search;
    //active_search = (int *) malloc(SEARCH_DIRECTIONS * sizeof(int));


    // determine which is the correct array of search directions based on the current coordinate
    // top left corner
        int i = 0;
    if(row == 0 && col == 0){
        while(Library[0][i] != -1){
            active_search[i] = Library[0][i];
            i++;
        }
    }
    // top right corner
    else if(row == 0 && col == last_col){
        while(Library[1][i] != -1){
            active_search[i] = Library[1][i];
            i++;
        }
    }
    // non-edge columns of first row
    else if(row == 0 && (col != 0 || col != last_col)){
        while(Library[2][i] != -1){
            active_search[i] = Library[2][i];
            i++;
        }
    }
    // non-edge coordinates (no edge columns nor rows)
    else if(row != 0 && row != last_row && col != 0 && col != last_col){
        while(Library[3][i] != -1){
            active_search[i] = Library[3][i];
            i++;
        }
    }
    // bottom left corner
    else if(row == last_row && col == 0){
        while(Library[4][i] != -1){
            active_search[i] = Library[4][i];
            i++;
        }
    }
    // bottom right corner
    else if(row == last_row && col == last_col){
        while(Library[5][i] != -1){
            active_search[i] = Library[5][i];
            i++;
        }
    }
    // non-edge columns of last row
    else if(row == last_row && (col != 0 || col != last_col)){
        while(Library[6][i] != -1){
            active_search[i] = Library[6][i];
            i++;
        }
    }
    // non-edge rows of first column
    else if((row != 0 || row != last_row) && col == 0){
        while(Library[7][i] != -1){
            active_search[i] = Library[7][i];
            i++;
        }
    }
    // non-edge rows of last column
    else if((row != 0 || row != last_row) && col == last_col){
        while(Library[8][i] != -1){
            active_search[i] = Library[8][i];
            i++;
        }
    }
    active_search[i] = -1;
}

I don't mean to sound rude, but it's probably your code that's bad, not the compiler. ;) Problems like this are actually more common than you'd think, because different OS's and compilers will have different ways of organizing your application's data in the stack and heap. The former can be particularly problematic, especially if you end up overwriting memory on the stack, or referencing freed memory which the system has decided to use for something else. So basically, you might get away with it sometimes, but other times your app will choke and die. Either way, if it segfaults, it's because you tried to reference memory which you were not allowed, so it's more of a "happy coincidence" that it didn't crash under another system/compiler.

But really, a segfault is a segfault, so you should instead debug your code looking for memory corruption instead of tweaking the compiler's configuration to figure out what's going wrong.

Edit : Ok, I see what you mean now... I thought you were coming at this problem with an "X sucks, but Y works just fine!" attitude, but it seems to be your TA who's got that. ;)

Anyways, here's some more hints for debugging problems like this:

  • Look for pointer arithmetic, referencing/dereferencing for possible "doh!" errors. Any place where you are adding/subtracting one (aka, fencepost errors) are particularly suspect.
  • Comment out calls to malloc/free around the problem area, and any associated areas where those pointers are used. If the code stops crashing, then you're headed in the right direction.
  • Assuming you've at least identified the general area where your code is crashing, insert early return statements in there and find the point where your code doesn't crash. This can help to find an area somewhere between that point and where your code actually crashes. Remember, a segfault like this may not necessarily happen directly at the line of code where your bug is.
  • Use the memory debugging tools available on your system.
    • On Unix, check out this guide for debugging memory on unix , and the valgrind profiler (@Sol, thx for reminding me about this one)
    • On Visual Studio/Windows, your good 'ol buddy CrtCheckMemory() comes in rather handy. Also, read up on the CRT memory debugging patterns , as they're one of the nicer features of working in VS. Often times, just leaving open a memory tab in VS is enough to diagnose bugs like this once you memorize the various patterns.
    • In Mac OSX, you can set a breakpoint on malloc_error_break (either from gdb or Xcode), which causes it the debugger to break whenever malloc detects memory corruption. I'm not sure whether that's available in other unix flavors, but a quick google search seems to indicate it's mac-only. Also, a rather "experimental" looking version of valgrind seems to exist for OSX .

Like others have said, you might want to post some of your code here, even if that's not the real point of your question. It might still be a good learning experience to have everyone here poke through your code and see if they can find what caused the segfault.

But yeah, the problem is that there are so many platform-dependent, as well as basically random, factors influencing a C program. Virtual memory means that sometimes , accessing unallocated memory will seem to work, because you hit an unused part of a page that's been allocated at some earlier point. Other times, it'll segfault because you hit a page that hasn't been allocated to your process at all. And that is really impossible to predict. It depends on where your memory was allocated, was it at the edge of a page, or in the middle? That's up to the OS and the memory manager, and which pages have been allocated so far, and...... You get the idea. Different compilers, different versions of the same compilers, different OS'es, different software, drivers or hardware installed on the system, anything can change whether or not you get a segfault when you access unallocated memory.

As for the TA's claim that cygwin is more "lax", that's rubbish, for one simple reason. Neither compiler caught the bug! If the "native" GCC compiler had truly been less lax, it would have given you an error at compile-time. Segfaults are not generated by the compiler. There's not much the compiler can do to ensure you get a segfault instead of a program that seemingly works.

我还没有听说过Cygwin下有关GCC怪异的任何具体信息,但是在您的情况下,使用-Wall命令行选项对gcc显示所有警告可能是个好主意,以查看是否发现了可能引起问题的任何内容代码中的段错误。

You definitely have a bug somewhere in your code. It's possible that the Windows memory manager is being more lax than the Linux memory manager. On Windows, you might be doing bad things with memory (like overwriting array bounds, memory leaks, double-free'ing, etc.), but it's letting you get away with it. A famous story related to this can be found at http://www.joelonsoftware.com/articles/APIWar.html (search for "SimCity" on that (somewhat lengthy) article).

Cygwin's version of gcc may have other default flags and tweaked settings (wchar_t being 2 bytes for example), but i doubt it is specifically more "lax" with code and even so - your code should not crash. If it does, then most probably there is a bug in your code that needs be fixed. For example your code may depend on a particular size of wchar_t or may execute code that's not guaranteed to work at all, like writing into string literals.

If you write clean code then it runs also on linux. I'm currently running firefox and the KDE desktop which together consist of millions of C++ lines, and i don't see those apps crashing :)

I recommend you to paste your code into your question, so we can look what is going wrong.

In the meantime, you can run your program in gdb , which is a debugger for linux. You can also compile with all mudflap checks enabled and with all warnings enabled. mudflaps checks your code at runtime for various violations:

[js@HOST2 cpp]$ cat mudf.cpp
int main(void)
{
  int a[10];
  a[10] = 3;  // oops, off by one.
  return 0;
}
[js@HOST2 cpp]$ g++ -fmudflap -fstack-protector-all -lmudflap -Wall mudf.cpp
[js@HOST2 cpp]$ MUDFLAP_OPTIONS=-help ./a.out
  ... showing many options ...
[js@HOST2 cpp]$ ./a.out 
*******                 
mudflap violation 1 (check/write): time=1234225118.232529 ptr=0xbf98af84 size=44
pc=0xb7f6026d location=`mudf.cpp:4:12 (main)'                                   
      /usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7f6026d]                    
      ./a.out(main+0xb9) [0x804892d]                                            
      /usr/lib/libmudflap.so.0(__wrap_main+0x4f) [0xb7f5fa5f]                   
Nearby object 1: checked region begins 0B into and ends 4B after                
mudflap object 0x9731f20: name=`mudf.cpp:3:11 (main) int a [10]'                
bounds=[0xbf98af84,0xbf98afab] size=40 area=stack check=0r/3w liveness=3        
alloc time=1234225118.232519 pc=0xb7f5f9fd                                      
number of nearby objects: 1                                                     
*** stack smashing detected ***: ./a.out terminated                             
======= Backtrace: =========
....

There are many mudflap checks you can do, and the above runs a.out using the default options. Another tools which helps for those kind of bugs is valgrind , which can also help you find leaks or off by one bugs like above. Setting the environment variable "MALLOC_CHECK_" to 1 will print messages for violations too. See the manpage of malloc for other possible values for that variable.

For checking where your program crashes you can use gdb :

[js@HOST2 cpp]$ cat test.cpp
int main() {
    int *p = 0;
    *p = 0;
}
[js@HOST2 cpp]$ g++ -g3 -Wall test.cpp
[js@HOST2 cpp]$ gdb ./a.out
...
(gdb) r
Starting program: /home/js/cpp/a.out

Program received signal SIGSEGV, Segmentation fault.
0x080483df in main () at test.cpp:3
3           *p = 0;
(gdb) bt
#0  0x080483df in main () at test.cpp:3
(gdb)

Compile your code with -g3 to include many debugging information, so gdb can help you find the precise lines where your program is crashing. All the above techniques are equally applicable for C and C++.

It's almost certainly a pointer error or buffer overrun, maybe an uninitialised variable.

An uninitialised pointer will usually point at nothing, but sometimes it will point at something; reading from it or writing to it will typically crash the program, but then again it MIGHT not.

Writing or reading from freed memory is the same story; you might get away with it, then again maybe not.

These situations depend on exactly how the stack, heap are laid out and what the runtime is doing. It is quite possible to make a bad program that works on one compiler / runtime combination and not another, simply because on one it overwrites something that doesn't matter (so much), or that an uninitialised variable "happens" to contain a valid value for the context it's used in.

The version of GCC is probably not the issue. It's more likely to be a difference in the runtime library and a bug in your code that doesn't manifest itself when running against the Windows version of the runtime. You might want to post the code that segfaults and some more background information if you want a more specific answer.

In general, it's best to develop under the environment you're going to use for running your code.

您是否在进行任何特定于平台的假设,例如数据类型的大小, struct数据结构对齐方式字节顺序

A segmentation fault means that you tried to access memory you couldn't, which usually means you tried dereferencing a null pointer or you double-deleted memory or got a wild pointer. There's two reasons why you might have appeared to be fine on cygwin and not on Linux: either it was an issue with the memory managers, or you got more lucky on one of them than the other. It is almost certainly an error with your code.

To fix this, look at your pointer use. Consider substituting smart pointers for raw pointers. Consider doing a search for delete and zeroing the pointer immediately afterwards (it's safe to try to delete a null pointer). If you can get a crack at Linux, try getting a stack trace through gdb and see if there's anything obviously wrong at the line it happens. Examine how you use all pointers that aren't initialized. If you have access to a memory debugging tool, use it.

Some hints:

  1. Post your code. I'll bet you will get some good input that will make you a better programmer.

  2. Turn on warnings with the -wall option and correct any problems that are reported. Again, it can help make you a better programmer.

  3. Step through the code with a debugger. Besides helping you understand where the problem is, it will help make you a better programmer.

  4. Continue to use Subversion or other source code control system.

  5. Never blame the compiler (or OS or hardware) until you are sure you've pinpointed the problem. Even then, be suspicious of your own code.


GCC on Linux is source-code identical to GCC on Cygwin. Differences between the platforms exist occur because of the Cygwin POSIX emulation layer and the underlying Windows API. It's possible the extra layers are more forgiving than the underlying hardware, but that's not be counted on.

Since it's homework, I'd say posting code is an even better idea. What better way to learn than getting input from professional programmers? I'd recommend crediting any suggestions you implement in nearby comments, however.

A segmentation fault is the result of accessing memory at a non-existent (or previously freed) address. What I find very interesting is that the code did NOT segfault under cygwin. That could mean that your program used a wild pointer to some other processes' address space and was actually able to read it (gasp), or (more likely) the code that actually caused the segfault was not reached until the program was run under Linux.

I recommend the following:

  1. Paste your code as it is a very interesting problem
  2. Send a copy of this to the cygwin developers
  3. Get a cheap Linux VPS if you'll be required to produce more programs that run under Linux, it will make your life much easier.

Once your working under Linux (ie shelled into your VPS), try working with the following programs:

  • GDB
  • Valgrind
  • strace

Also, you can try libraries like electric fence to catch these kinds of things as they happen while your program is running.

Finally, make sure -Wall is passed to gcc, you want the warnings it would convey.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM