简体繁体 English

CPU 上 IEC 61508 软件元素的独立性，不带存储器保护单元

[英]Independence of software elements for IEC 61508 on CPU without memory protection unit

原文 2020-03-27 11:47:18 6 2 c/ standards/ safety-critical

Is it possible to justify independence of software elements by IEC 61508, part 3, Annex F, such that the safety-related components can be rated SIL 2 and the non-safety components (eg UI, comms.) can be left unrated, and still have an overall result that is rated as SIL 2?是否有可能通过 IEC 61508 第 3 部分附录 F 来证明软件元素的独立性，以便安全相关组件的评级为 SIL 2，而非安全组件（例如 UI、通信）可以不评级，以及仍然有一个被评为 SIL 2 的整体结果？

In particular I am interested in views on this when the safety and non-safety elements are all running on a single processor, and the processor does not implement any form of hardware memory protection.当安全和非安全元素都在单个处理器上运行，并且处理器没有实现任何形式的硬件内存保护时，我特别感兴趣的是对此的看法。 There are all sorts of things one could do to ensure that there is non-interference of software elements, such as ensuring data integrity, data passing is strictly controlled and verified, task scheduling is deterministic (non-safety tasks guaranteed to terminate), and so on.可以做各种各样的事情来确保软件元素不受干扰，例如确保数据完整性，数据传递受到严格控制和验证，任务调度是确定性的（非安全任务保证终止），以及很快。

Would such techniques rigorously applied be sufficient?严格应用这些技术就足够了吗？

2 个解决方案

This is a question that doesn't have a definite answer.这是一个没有明确答案的问题。 The answer is merely opinion based or depends on specific conditions.答案仅仅是基于意见或取决于特定条件。

If you have a company/organization that will do the assessment or certification you should ask them ( edit for clarification: ) if your approach is OK.如果您有一家公司/组织将进行评估或认证，您应该询问他们（编辑以澄清：）您的方法是否可行。 As far as I understand the standards for the development of safety-critical devices you have to document that you considered all possible risks and how you detect or prevent the possible faults.据我了解安全关键设备的开发标准，您必须记录您考虑了所有可能的风险以及您如何检测或防止可能的故障。

In a project to be certified to conform to a similar standard we put all safety-related data and code into specific memory sections and "lock" the safety-related data section by calculating a CRC after leaving the safety-related functions and check the CRC before entering again.在要认证符合类似标准的项目中，我们将所有与安全相关的数据和代码放入特定的内存部分，并通过在离开安全相关功能后计算 CRC 并检查 CRC 来“锁定”安全相关数据部分在再次进入之前。

Additionally we check that the function to "lock" the data is called from the safety-related code section only by checking the return address.此外，我们仅通过检查返回地址来检查“锁定”数据的函数是否是从安全相关代码部分调用的。 Any unexpected modification of the safety-related data will be detected, and the device will enter a safe state.安全相关数据的任何意外修改都将被检测到，设备将进入安全状态。

In our case this approach was sufficient to convince the people responsible for checking our software development.在我们的案例中，这种方法足以说服负责检查我们软件开发的人员。

Edit (to answer a comment)编辑（回答评论）

Of course we are convinced ourselves that this solution is sufficient for the described purpose in the affected device.当然，我们确信该解决方案足以在受影响的设备中实现所描述的目的。

This mechanism is only one part of the safety concept of the device.这种机制只是设备安全概念的一部分。

The CRC mechanism described here is used to protect safety-related data in RAM against unwanted modification by non-safety functions to ensure independence of the safety-related functions from the non-safety functions.此处描述的CRC机制用于保护 RAM中的安全相关数据免受非安全功能的不必要修改，以确保安全相关功能与非安全功能的独立性。 (It is not related to protecting the binary program in flash memory against modification. Of course we also do this using ECC flash and CRCs.) （这与保护闪存中的二进制程序不被修改无关。当然，我们也使用 ECC 闪存和 CRC 来做到这一点。）

Another edit: We also check periodically that the safety-related peripheral registers are not modified unexpectedly.另一个编辑：我们还定期检查与安全相关的外设寄存器没有被意外修改。

We have lots of other safety measures in the hardware and software, but these are not related to the question how to justify independence of software parts without an MPU.我们在硬件和软件方面还有很多其他安全措施，但这些与如何证明没有 MPU 的软件部分的独立性的问题无关。

The device which uses the technique described here conforms to a different standard with a safety level approx.使用此处描述的技术的设备符合不同的标准，安全级别约为between SIL 1 and SIL 2. SIL 1 和 SIL 2 之间。

Of course every user must check if this solution is sufficient for a specific device.当然，每个用户都必须检查此解决方案对于特定设备是否足够。

If there is any safety-related firmware present in a MCU, then all of its software is safety-related.如果 MCU 中存在任何与安全相关的固件，则其所有软件都与安全相关。 Period.时期。

Common sense dictates that any bug anywhere in your code could cause runaway code, stack overflows, out of bounds access, spurious interrupts and so on.常识表明，代码中任何地方的任何错误都可能导致代码失控、堆栈溢出、越界访问、虚假中断等。 Not to mention bugs related to the interface between safety and non-safety related parts.更不用说与安全和非安全相关部件之间的接口相关的错误。

To make an argument about independence in a system where you would treat some parts of the software as less critical, you would need something like multiple cores executing code in different memory areas, without the slightest possibility to affect each other in any way.要在一个系统中争论独立性，您会将软件的某些部分视为不那么重要，您需要像多个内核在不同的内存区域中执行代码之类的东西，而不会以任何方式相互影响。 Which in turn would be a strange and needlessly complex design.这反过来又会是一个奇怪且不必要的复杂设计。

The normal approach is rather to set the same quality standard for every part of the code.通常的方法是为代码的每个部分设置相同的质量标准。 Meaning that if you need to run some non-critical code in some non-verified stack or 3rd party lib, you should probably consider moving that to a separate physical chip.这意味着如果您需要在某个未经验证的堆栈或第 3 方库中运行一些非关键代码，您可能应该考虑将其移至单独的物理芯片。 Keep the safety related parts as small and simple as possible.保持与安全相关的部件尽可能小而简单。