简体   繁体   English

你如何设计一个配置文件系统来应对嵌入式环境中的突然关机?

[英]How do you design a configuration file system that copes with abrupt shutdown in embedded environment?

I'm designing a software that manages configuration file.我正在设计一个管理配置文件的软件。

Generally, it maintains two copies of the configuration file, one in RAM and one in flash memory.一般它会维护两份配置文件,一份在RAM中,一份在flash memory中。 As soon as end-users update setting(s) by UI, the software saves it to the file in RAM, and then copy-paste it to the file in flash memory.一旦最终用户通过 UI 更新设置,软件会将其保存到 RAM 中的文件中,然后将其复制粘贴到 flash memory 中的文件中。

This scheme makes sure best stability in that the software reflects reality at the next second.该方案确保最佳稳定性,因为软件在下一秒反映现实。 However, the scheme compromises longevity to flash memory by accessing it every time.但是,该方案每次都访问它会损害 flash memory 的寿命。

As to longevity issue, I've thought about it by having a dedicated program doing this housekeeping, and adds this program to crontab then let it run like every 30 mins.至于寿命问题,我已经考虑过有一个专门的程序来做这个家务,并将这个程序添加到 crontab 中,然后让它每 30 分钟运行一次。 (Note: flash memory wears off only during erase cycles; the program only does housekeeping if the both files are not the same.) (注意:flash memory 仅在擦除周期内磨损;如果两个文件不相同,程序只会进行内务处理。)

But if the file in RAM is waiting for the program to do housekeeping and system shuts down unexpectedly, the file will lose.但是如果RAM中的文件正在等待程序做家务,系统意外关闭,文件就会丢失。

So I'm thinking is there a way to have both longevity and not losing file at the same time?所以我在想有没有办法既长寿又不丢失文件? Or am I missing something?还是我错过了什么?

There are many different reasons why flash can get corrupted: data retention over time, erase/write failures which are primarily caused by erase/write cycle wear, clock inaccuracies, read disturb in case of NAND flash, and even less likely errors sources such as cosmic rays or EMI. flash 损坏的原因有很多:随着时间的推移数据保留、主要由擦除/写入周期磨损引起的擦除/写入失败、时钟不准确、NAND flash 情况下的读取干扰,甚至更不可能的错误源,例如宇宙射线或 EMI。 But also as in your case, algorithmic layer problems such as a flash erase/write getting interrupted by power loss or reset caused by EMI.但也与您的情况一样,算法层问题,例如 flash 擦除/写入因 EMI 引起的断电或复位而中断。

Similarly, there are many ways to deal with these various problems.同样,有很多方法可以处理这些不同的问题。

  • CRC16 or CRC32 depending on flash size is the classic way to deal with pretty much all possible flash errors, particularly with data retention since it most often manifests itself as single-bit errors, which CRC is great at discovering.取决于 flash 大小的CRC16 或 CRC32是处理几乎所有可能的 flash 错误的经典方法,特别是在数据保留方面,因为它最常表现为单比特错误,CRC 擅长发现。 It should ideally be designed so that the checksum is placed at the end of each erase-size segment.理想情况下,它应该被设计成校验和放置在每个擦除大小段的末尾。 Or in case erase-size is very small (emulated eeprom/data flash etc), maybe a single CRC32 at the end of all data.或者如果擦除大小非常小(模拟 eeprom/数据 flash 等),则可能在所有数据的末尾有一个 CRC32。 Modern MCUs often have a CRC hardware peripheral which might be helpful.现代 MCU 通常具有 CRC 硬件外设,这可能会有所帮助。

    Optionally you can let the CRC algorithm repair single bit errors, though this practice is often banned in high integrity systems.您可以选择让 CRC 算法修复单个位错误,尽管这种做法通常在高完整性系统中被禁止。

  • ECC is used on NAND flash or in high integrity systems. ECC用于 NAND flash 或高完整性系统。 Traditionally done through software (which is quite cumbersome), but lately also available through built-in hardware support, particularly on the "safety/chassis" kind of automotive microcontrollers.传统上通过软件完成(这很麻烦),但最近也可以通过内置硬件支持实现,特别是在“安全/底盘”类型的汽车微控制器上。 If you wish to use ECC then I highly recommend picking a part with such built-in support, then it can be used to replace manual CRC (which is somewhat painful to deal with real-time wise).如果您想使用 ECC,那么我强烈建议您选择具有这种内置支持的部件,然后它可以用来代替手动 CRC(这在处理实时方面有些痛苦)。

    These parts with hardware ECC may also support a feature with an area where you can write down variables to have the hardware handle writing them to flash in the background, kind of similar to DMA.这些带有硬件 ECC 的部件还可能支持一个功能,其中您可以写下变量以让硬件句柄在后台将它们写入 flash,有点类似于 DMA。

  • Using the flash segment as FIFO .使用 flash 段作为 FIFO When storing reasonably small amounts of data in memory with large erase sizes, you can save flash erase/write cycles by only erasing the whole segment once, after which it will likely be set to "all ones" 0xFFFF... When writing, you look for the last available chunk of memory which is "all ones" and write there, even though the same data was previously written just before it.当在 memory 中以较大的擦除大小存储相当少量的数据时,您可以通过仅擦除整个段一次来节省 flash 擦除/写入周期,之后它可能会设置为“全为”0xFFFF ... 写入时,您寻找 memory 的最后一个可用块,它是“全1”并在那里写入,即使之前写入了相同的数据。 And when reading, you fetch the last written chunk before "all ones".并且在阅读时,您会在“all one”之前获取最后写入的块。 Only when the whole erase size is used up do you perform an erase and start over from the beginning - data needs to be stored in RAM during this.只有当整个擦除大小用完时,您才会执行擦除并从头开始 - 在此期间数据需要存储在 RAM 中。

    I strongly recommend picking a part with decent data flash though, meaning small erase sizes - so that you don't need to resort to hacks like this.我强烈建议选择具有良好数据 flash 的部件,这意味着擦除大小较小 - 这样您就不需要诉诸这样的黑客攻击。

  • Mirror segments where all memory is stored as duplicates in two separate segments is mandatory practice for high integrity systems, though this can also be used to prevent corruption during power loss/unexpected resets and of course flash corruption in general.所有 memory 作为副本存储在两个单独的段中的镜像段是高完整性系统的强制性做法,尽管这也可用于防止断电/意外重置期间的损坏,当然还有一般的 flash 损坏。 The idea is to always have at least one segment of intact data at all times, and optionally repair a corrupt one by overwriting it with the correct one at start-up.这个想法是始终拥有至少一段完整的数据,并且可以通过在启动时用正确的数据覆盖来修复损坏的数据。 Also meaning that one segment must be verified to be correct and complete before writing to the next.也意味着在写入下一个段之前,必须验证一个段的正确性和完整性。

  • Keep the product cool .保持产品凉爽 This is a hardware solution obviously, but data retention in particular is heavily affected by ambient temperature.这显然是一种硬件解决方案,但数据保留尤其受到环境温度的严重影响。 The manufacturer normally guarantees some 15-20 years or so up to 85°C, but that might mean 100 years if you keep it at <25°C.制造商通常保证 15-20 年左右,最高 85°C,但如果您将其保持在 <25°C,则可能意味着 100 年。 As in, whenever possible, avoid mounting your MCU PCB near exhausts, oil coolers, hydraulics, heating elements etc etc.尽可能避免将 MCU PCB 安装在排气口、油冷却器、液压系统、加热元件等附近。

Mirror segments in combination with CRC and/or ECC is likely the solution you are looking for here.结合 CRC 和/或 ECC 的镜像段可能是您在这里寻找的解决方案。 Again, I strongly recommend to pick a MCU with dedicated data flash, meaning small erase segments and often far more erase/write cycles, ideally >100k.同样,我强烈建议选择具有专用数据 flash 的 MCU,这意味着小的擦除段和通常更多的擦除/写入周期,理想情况下 >100k。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM