简体   繁体   中英

history of memcpy and memset vs assignment and initialization

So, my understanding is that the following code:

somestruct_t a = {0};
somestruct_t b;
b = a;

is always preferable, when possible, to:

somestruct_t a;
somestruct_t b;
memset(&a, 0, sizeof(a));
memcpy(&b, &a, sizeof(a));

And the top constructs are almost always possible...which leads me to my question: Since the top code both performs well, and is to me clearly more intuitive to someone learning the language, why are the memset and memcpy patterns so amazingly prevalent in C and even non-OO C++ code? Literally every project I've worked on for decades prefers the bottom pattern.

I'm assuming there is some historical reason for it, such as very old compilers not supporting it or somesuch, but I'd very much like to know the specific reason.

I know general history questions are off-topic, but this is about a very specific bad practice that I would like to understand better.

EDIT I am NOT trying to assert that memcpy and memset are bad in general. I'm talking about a very specific use pattern of assignment or initialization of a single structure.

It sounds like your experience is significantly different from mine, and from several of the other commentators here.

I don't know anyone who prefers

memcpy(&a, &b, sizeof(a));

over

a = b;

In my programming world (and in just about any world I can imagine), simple assignment is vastly preferable to memcpy . memcpy is for moving chunks of arbitrary data around (analogous to strcpy , but when it's arbitrary bytes instead of null-terminated strings). It's hard to imagine why anyone would advocate using memcpy instead of struct assignment. Naturally there are individual programmers everywhere who have gotten into various bad habits, so I guess I can't be too surprised if there are some who prefer the opposite, but I have to say, I would generally disagree with what they're doing.

Someone speculated in the comments that there was perhaps some historical precedent at work, but at least for the memcpy -versus-assignment questions, I can state with some certainty that this is not the case.

Once upon a time, before there was C90 memcpy , there was BSD bcopy , but before there was bcopy there wasn't a standard function for doing an efficient copy of a bunch of bytes from point a to point b. But there was struct assignment, which really has been in the language almost from the beginning. And struct assignment typically uses a nice, tight, compiler-generated byte-copying loop. So there was a time when it was fashionable to do something like this:

#define bcpy(a, b, n) (*(struct {char x[n];} *)a = *(struct {char x[n];} *)b)

I may have gotten the syntax wrong, but this hijacks the compiler's ability to do efficient struct assignment, and repurposes it to copy n bytes from arbitrary pointer b to arbitrary pointer a , ie just like bcopy or memcpy .

In other words, it's not like memcpy came first, followed by struct assignment -- it was actually exactly the opposite!

Now, memset versus struct initialization is a different story.

Most of the "clean" ways of zeroing a struct are initializations, but of course it's not uncommon to want to set a struct to all zero at some point later than when it was defined. It's also not uncommon to have a dynamically-allocated struct, and using malloc / realloc rather than calloc . So in those cases, memset is attractive. I think modern C has struct constants you can use at any time, but I'm guessing I'm not the only one who still hasn't learned them and so is still tending to use memset instead.

So I wouldn't consider using memset to be poor style, not in the same way as memcpy is poor style for struct assignment.

Although I have seen, and written, code that did something like

struct s zerostruct = { 0 };

and then later

a = zerostruct;

as a "better style" alternative to

memset(&a, 0, sizeof(a));   

Bottom line: I wouldn't agree that memcpy is recommended over struct assignment, and I am critical of anyone who prefers it. But memset is quite useful (and not disrecommended) for zeroing structures, because the alternatives aren't nearly as compelling.

There is exactly one use case where

struct somestruct  foo = { 0 };

is not sufficient, and

struct somestruct  foo;
memset(&foo, 0, sizeof foo);

needs to be used instead: when the padding in the structure may be important.

You see, the only difference between the two is that the latter is quaranteed to clear structure padding to zero too, whereas the former is only quaranteed to clear structure members to zero.

The reason one might care about padding, is based on upwards/future compatibility. If padding is quaranteed to be zero in current programs, a future version of a library can "reuse" the padding for new data fields, and still work with older binaries.

Since C99, new C libraries do, and should, reserve some members for just that purpose explicitly. That's usually why you see "reserved" fields in structures defined by many libraries, and even in the Linux kernel-userspace interface. So, the padding issue is really only relevant to structures developed before C99 support became widespread; in other words, in old libraries only.

The one structure I know should always be cleared using memset() , is struct sigaction , defined in POSIX.1. In most POSIXy systems it is a perfectly normal structure (and so code that just clears the members of the structure will work absolutely fine on those systems), but because of the various different implementations at different times (especially how the signal mask is implemented), I believe there are still systems with C libraries that have a version of the structure where clearing the padding is still important.

(This is because of how the sa_handler and sa_sigaction members are usually in an union, and/or because the definition of sigset_t may have changed.)

It is possible there are others in some other older libraries, so I would recommend using the memset() idiom when working with libraries with pre-1999 roots, whose example code also uses it.

Well, it's not true that struct assignment and "zero-initialization" is always preferable to memset() and memcpy() . It may just as well be that a compiler does a better job optimizing memset/cpy() (having "special knowledge" of these two standard library functions), for either size or speed. Granted, would be a little strange, but, possible.

Also, since "zero initialization" doesn't work on heap-allocated structs, there's something to be said for consistency of always using memset() .

Similar goes for situations when you might want to copy a few adjacent structs (part of an array) - again, if you always use memcpy() , code is more consistent.

On a historic note, I've worked with toolchains where local struct initialization was broken. On projects that went through such toolchains, the "always use memset() " will prevail even after such toolchains are dismissed. The thing is, even if your toolchain's memset() is broken, you can make your own, but you can't make your own local struct initialization...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM