控制C中存储器映射寄存器的读写访问宽度

Question

I'm using and x86 based core to manipulate a 32-bit memory mapped register. 我正在使用基于x86的内核来操作32位内存映射寄存器。 My hardware behaves correctly only if the CPU generates 32-bit wide reads and writes to this register. 仅当CPU对该寄存器产生32位宽的读写操作时，我的硬件才能正常工作。 The register is aligned on a 32-bit address and is not addressable at byte granularity. 寄存器在32位地址上对齐，并且不能以字节粒度进行寻址。

What can I do to guarantee that my C (or C99) compiler will only generate full 32-bit wide reads and writes in all cases? 我该怎么做才能保证我的C（或C99）编译器在所有情况下都只能生成完整的32位宽读写？

For example, if I do a read-modify-write operation like this: 例如，如果我执行这样的读 - 修改 - 写操作：

volatile uint32_t* p_reg = 0xCAFE0000;
*p_reg |= 0x01;

I don't want the compiler to get smart about the fact that only the bottom byte changes and generate 8-bit wide read/writes. 我不希望编译器明智地知道只有底部字节发生变化并产生8位宽的读/写。 Since the machine code is often more dense for 8-bit operations on x86, I'm afraid of unwanted optimizations. 由于机器代码在x86上的8位操作通常更密集，我害怕不必要的优化。 Disabling optimizations in general is not an option. 一般情况下禁用优化不是一种选择。

----- EDIT ------- -----编辑-------
An interesting and very relevant paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf 一篇有趣且非常相关的论文： http ： //www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

Answer 1

Your concerns are covered by the volatile qualifier. 您的疑虑由volatile限定符涵盖。

6.7.3/6 "Type qualifiers" says: 6.7.3 / 6“类型限定词”说：

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. 具有volatile限定类型的对象可能以实现未知的方式进行修改，或者具有其他未知的副作用。 Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. 因此，任何涉及这种对象的表达都应严格按照抽象机的规则进行评估，如5.1.2.3所述。 Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. 此外，在每个序列点，最后存储在对象中的值应与抽象机器规定的值一致，除非由前面提到的未知因素修改。 What constitutes an access to an object that has volatile-qualified type is implementation-defined. 什么构成对具有volatile限定类型的对象的访问是实现定义的。

5.1.2.3 "Program execution" says (among other things): 5.1.2.3“程序执行”（除其他外）说：

In the abstract machine, all expressions are evaluated as specified by the semantics. 在抽象机器中，所有表达式都按语义指定进行计算。

This is followed by a sentence that is commonly referred to as the 'as-if' rule, which allows an implementation to not follow the abstract machine semantics if the end result is the same: 接下来是一个通常称为“as-if”规则的句子，如果最终结果相同，则允许实现不遵循抽象机器语义：

An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object). 实际实现不需要评估表达式的一部分，如果它可以推断出它的值未被使用并且不产生所需的副作用（包括由调用函数或访问易失性对象引起的任何副作用）。

But, 6.7.3/6 essentially says that volatile-qualified types used in an expression cannot have the 'as-if' rule applied - the actual abstract machine semantics must be followed. 但是，6.7.3 / 6基本上表示表达式中使用的volatile限定类型不能应用“as-if”规则 - 必须遵循实际的抽象机器语义。 Therefore, if pointer to a volatile 32-bit type is dereferenced, then the full 32-bit value must be read or written (depending on the operation). 因此，如果取消引用指向易失性32位类型的指针，则必须读取或写入完整的32位值（取决于操作）。

Answer 2

The ONLY way to GUARANTEE that the compiler will do the right thing is to write your load and store routines in assembler and call them from C. 100% of the compilers I have used over the years can and will get it wrong (GCC included). 保证编译器执行正确操作的唯一方法是在汇编程序中编写加载和存储例程并从C调用它们。我多年来使用的编译器100％可以并且会出错（包括GCC）。

Sometimes the optimizer gets you, for example you want to store some constant that appears to the compiler as a small number 0x10 lets say, into a 32 bit register, which is what you asked specifically and what I have watched otherwise good compilers try to do. 有时候优化器会让你，例如你想要将一些常量存储在编译器中，就像一个小数字0x10所说的那样，进入一个32位寄存器，这是你特别要求的，以及我所看到的其他好编译器尝试做的事情。 Some compilers will decide that it is cheaper to do an 8 bit write instead of a 32 bit write and change the instruction. 一些编译器会认为执行8位写操作而不是32位写操作更便宜并更改指令。 Variable instruction length targets are going to make this worse as the compiler is trying to save program space and not just memory cycles on what it may assume the bus to be. 可变指令长度目标将使这更糟糕，因为编译器试图节省程序空间而不仅仅是内存周期，它可能假定总线是什么。 (xor ax,ax instead of mov eax,0 for example) （xor ax，ax而不是mov eax，例如0）

And with something that is constantly evolving like gcc, code that works today has no guarantees of working tomorrow (you cant even compile some versions of gcc with the current version of gcc). 有了像gcc一样不断发展的东西，今天运行的代码无法保证明天工作（你甚至无法使用当前版本的gcc编译某些版本的gcc）。 Likewise code that works on the compiler at your desk may not work universally for others. 同样，适用于您桌面编译器的代码可能无法普遍适用于其他人。

Take the guessing and the experimenting out of it, and create load and store functions. 进行猜测和实验，并创建加载和存储功能。

The side benefit to this is that you create a nice abstraction layer, if/when you want to simulate your code in some fashion or have the code run in application space instead of on the metal, or vice versa, the assembler functions can be replaced with a simulated target or replaced with code that crosses a network to a target with the device on it, etc. 这样做的另一个好处是你可以创建一个很好的抽象层，如果你想以某种方式模拟你的代码或者让代码在应用程序空间而不是在金属上运行，反之亦然，可以替换汇编程序函数使用模拟目标或替换为跨越网络的代码，并将设备放在目标上，等等。

Answer 3

Well, generally speaking I wouldn't expect it to optimize out the high order bytes if you have the register typed as a 32 bit volatile. 好吧，一般来说，如果您将寄存器键入为32位易失性，我不会指望它优化高位字节。 Due to the use of the volatile keyword the compiler cannot assume that the values in the high order bytes are 0x00. 由于使用了volatile关键字，编译器不能假设高位字节中的值是0x00。 Thus it must write the full 32bits even if you are only using a 8bit literal value. 因此，即使您只使用8位字面值，它也必须写入完整的32位。 I've never experience a issue with this on the 0x86 or Ti processors, or other embedded processors. 我从来没有在0x86或Ti处理器或其他嵌入式处理器上遇到过这个问题。 Generally the volatile keyword is enough. 通常，volatile关键字就足够了。 The only time things get a little weird is if the processor does not natively support the word size you're trying to write, but that shouldn't be an issue on the 0x86 for a 32 bit number. 事情变得有点奇怪的唯一一次是处理器本身不支持您尝试写入的字大小，但对于32位数字而言，这不应该是0x86的问题。

While it would be possible for the compiler to generate a instruction stream that used 4 bit writes, that would not be an optimization in either processor time or instruction space over a single 32 bit write. 虽然编译器可以生成使用4位写入的指令流，但这不会是在单个32位写入上的处理器时间或指令空间中的优化。

Answer 4

If you don't use byte (unsigned char) types when accessing the hardware, there will be a better chance of the compiler not generating 8-bit data transfer instructions. 如果在访问硬件时不使用字节（无符号字符）类型，则编译器更有可能不生成8位数据传输指令。

volatile uint32_t* p_reg = 0xCAFE0000;
const uint32_t value = 0x01;  // This trick tells the compiler the constant is 32 bits.
*p_reg |= value;

You would have to read the port as a 32 bit value, modify the value, then write back: 您必须将端口读取为32位值，修改该值，然后回写：

uint32_t reg_value = *p_reg;
reg_value |= 0x01;
*p_reg = reg_value;

Answer 5

Since a read-modify-write operation against hardware always is a huge risk to do in several instructions, most processors offer an instruction to manipulate a register/memory with one single instruction that can't be interrupted. 由于针对硬件的读 - 修改 - 写操作总是在几个指令中做很大的风险，因此大多数处理器提供用单个指令操作寄存器/存储器的指令，该指令不能被中断。

Depending on what type of register you are manipulating, it could change during your modify phase and then you would write back a false value. 根据您正在操作的寄存器类型，它可能会在您的修改阶段发生变化，然后您将写回错误值。

I would recommend as dwelch suggest to write your own read-modify-write function in assembly if this is critical. 我建议dwelch建议在汇编时编写自己的读 - 修改 - 写函数。

I have never heard of a compiler that optimizes a type (doing a type conversion with purpose to optimize). 我从来没有听说过优化类型的编译器（进行有意优化的类型转换）。 If it is declared as an int32 it is always a int32 and will always be aligned right in memory. 如果它被声明为int32，它总是一个int32，并且总是在内存中对齐。 Check your compiler documentation to see how the various optimizations work. 检查编译器文档以了解各种优化的工作原理。

I think I know where your concern comes from, structures. 我想我知道你的问题来自哪里，结构。 Structures are usually padded to the optimal alignment. 通常将结构填充到最佳对齐。 This is why you need to wrapp a #pragma pack() around them to get them byte aligned. 这就是为什么你需要在它们周围包装一个#pragma pack（）来使它们按字节对齐。

You can just single step through the assembly and then you will see how the compiler translated your code. 您可以单步执行程序集，然后您将看到编译器如何翻译您的代码。 I'm pretty sure it has not changed your type. 我很确定它没有改变你的类型。

控制C中存储器映射寄存器的读写访问宽度

问题描述

5 个解决方案

解决方案1
6 已采纳 2010-06-14 22:45:55

解决方案2
4 2010-06-14 22:35:33

解决方案3
0 2010-06-14 19:11:27

解决方案4
0 2010-06-14 20:17:13

解决方案5
0 2010-06-14 23:11:49

控制C中存储器映射寄存器的读写访问宽度

问题描述

5 个解决方案

解决方案1 6 已采纳 2010-06-14 22:45:55

解决方案2 4 2010-06-14 22:35:33

解决方案3 0 2010-06-14 19:11:27

解决方案4 0 2010-06-14 20:17:13

解决方案5 0 2010-06-14 23:11:49

解决方案1
6 已采纳 2010-06-14 22:45:55

解决方案2
4 2010-06-14 22:35:33

解决方案3
0 2010-06-14 19:11:27

解决方案4
0 2010-06-14 20:17:13

解决方案5
0 2010-06-14 23:11:49