简体繁体 English

HL7 v2X和v3数据建模

[英]HL7 v2X and v3 data modeling

原文 2014-03-04 14:27:31 9 5 sql/ sql-server/ hl7/ hl7-cda/ hl7-v2

The company I work for has started a new initiative in HL7 where we are trading both v2X and v3 (CDA specifically) messages. 我工作的公司在HL7开始了一项新计划，我们正在交易v2X和v3（CDA专用）消息。 I am at the point where I am able to accept, validate and acknowledge the messages we are receiving from our trading partners and have started to create a data model for the backend storage of said messages. 我正处于能够接受，验证和确认我们从贸易伙伴那里收到的消息并开始为所述消息的后端存储创建数据模型的地步。 After a lot of consideration and research I am at a loss for the best way to approach this in MS SQL Server 2008 R2. 经过大量的考虑和研究后，我在MS SQL Server 2008 R2中找到最佳方法。

Currently my idea is to essentially load the data into a data warehouse directly from my integration engine (BizTalk) and foregoing a backing, normalized operational database. 目前我的想法是直接从我的集成引擎（BizTalk）将数据加载到数据仓库中，并在前面提供支持的规范化操作数据库。 I have set up the database for v2X messages according to the v2.7 specs as all versions of HL7 v2 are backward compatible (I can store any previous versions in the same database). 我已经根据v2.7规范为v2X消息设置了数据库，因为所有版本的HL7 v2都是向后兼容的（我可以在同一个数据库中存储任何以前的版本）。 My initial design has a table for each segment which will tie back to a header table with a guid I am generating and storing at run time. 我的初始设计有一个每个段的表，它将绑定到一个带有我正在生成并在运行时存储的guid的头表。 The biggest issue with this approach is the amount of columns in each table and it's something I have no experience with. 这种方法的最大问题是每个表中的列数量，这是我没有经验的。 For instance the PV1 segment has 569 columns in order to accommodate all possible data. 例如，PV1段有569列，以容纳所有可能的数据。 In addition to this I need to make all columns varchar and make them big enough to house any possible customization scenario from our vendors. 除此之外，我需要使所有列varchar并使它们足够大以容纳来自我们供应商的任何可能的定制方案。 I am planning on using varchar(1024) to achieve this. 我打算使用varchar（1024）来实现这一目标。 A lot of these columns (the majority probably) would be NULL so I would use SPARSE columns. 很多这些列（大多数可能）都是NULL，所以我会使用SPARSE列。 This screams bad design to me but fully normalizing these tables would require a ton of work in both BizTalk and SQL server and I'm not sure what I would gain from doing so. 这对我来说是一个糟糕的设计，但是对这些表进行完全规范化需要在BizTalk和SQL服务器上进行大量工作，而且我不确定这样做会得到什么。 I'm trying to be pragmatic since I have a deadline. 我正努力务实，因为我有一个截止日期。

If fully normalized, I would essentially have to create stored procs that would have a ton of parameters OR split these messages to the nth degree to do individual loads into the smaller subtables and make sure they all correlate back to the original guid. 如果完全标准化，我基本上必须创建具有大量参数的存储过程，或者将这些消息拆分到第n度，以便将单个负载分配到较小的子表中，并确保它们都与原始guid相关联。 I would also want to maintain ACID processing which could get tricky and cause a lot of overhead in BizTalk. 我还想维护ACID处理，这可能会变得棘手并导致BizTalk中的大量开销。 I suppose a 3rd option would be to use nHapi to create objects out of the messages I could tie into with Entity Framework but nHapi seems like a dead project and I have no experience with Entity Framework as of right now. 我想第三个选择是使用nHapi从我可以与Entity Framework绑定的消息中创建对象，但是nHapi似乎是一个死的项目，我现在没有使用Entity Framework的经验。

I'm basically at a loss and need help from some industry professionals who have experience with HL7 data modeling. 我基本上处于亏损状态，需要一些具有HL7数据建模经验的行业专业人士的帮助。 Is it worth the extra effort to fully normalize the tables? 完全规范表格是否值得付出额外的努力？ Will performance on the SQL side be abysmal if I use these denormalized segment tables with hundreds of columns (most of which will be NULL for each row)? 如果我使用这些具有数百列的非规范化段表（大多数列对于每一行都是NULL），SQL端的性能是否会很糟糕？ I'm not a DBA so I'm trying to understand the pitfalls of each approach. 我不是DBA所以我试图理解每种方法的缺陷。 I've also looked at RIMBAA but the HL7 RIM seems like a foreign language to me as an HL7 newbie and translating v2 messages to the RIM would probably take far longer than I have to complete this project. 我也看过RIMBAA但HL7 RIM对我来说似乎是一个外语，作为一个HL7新手，将翻译v2消息转换为RIM可能需要比我完成这个项目更长的时间。 I'm hoping I'm overthinking this and there is a simpler solution staring me in the face. 我希望我能过度思考这个问题，并且有一个更简单的解决方案让我盯着我。 Hopefully this question isn't too open ended. 希望这个问题不是太开放。

5 个解决方案

HL7 is not a "tight" standard inputs and expected outputs vary depending on the system you are talking to. HL7不是“严格”的标准输入，预期输出会根据您所使用的系统而有所不同。 In this case the adding in a broker such as Mirth, Rhaposdy or BizTalk is a very good idea. 在这种情况下，在Mirth，Rhaposdy或BizTalk等经纪人中添加是一个非常好的主意。

What ever solution you employ make sure you can cope with "non standard" input and output as you will soon find things vary. 您使用的解决方案确保您可以应对“非标准”输入和输出，因为您很快就会发现各种不同。 On the HL7 versions 2X and 3 be aware that very few hospitals have the version 3 most still run 2X. 在HL7版本2X和3上，请注意很少有医院的版本3仍然运行2X。

I have been down the road of working with a database that tried to follow the HL7 structure, it can work however it will take time and effort. 我一直在使用一个试图遵循HL7结构的数据库，它可以工作，但这需要时间和精力。 Given that you have a tight dead line maybe break out the bits of the data you will need to search on and have fields (eg PID segment 3 is the patient id would be useful to have) the rest can go in your varchar. 鉴于你有一个紧张的死线可能会打破你需要搜索的数据位并有字段（例如，PID段3是患者ID将有用）其余的可以进入你的varchar。 Also if you are not indexing on the column you could use varchar(max). 此外，如果您没有对列进行索引，则可以使用varchar（max）。

As for your Guids in the database, this can work fine, but be careful not to cluster any indexes using the Guid as this will fragment your data. 至于数据库中的Guids，这可以正常工作，但要注意不要使用Guid对任何索引进行聚类，因为这会碎片化您的数据。 Do your research here and if in doubt go for identity columns instead. 在这里做你的研究，如果有疑问，请选择身份专栏。

I'll recommend the entity framework too, excellent ORM, well worth learning. 我也会推荐实体框架，优秀的ORM，非常值得学习。

So my overall advice. 所以我的总体建议。 Go for a hybrid for now, breaking out what you need. 现在就去混合动力车，打破你需要的东西。 Expect it to evolve over time breaking out the pieces of HL7 into their own areas as needed. 期望它随着时间的推移而发展，根据需要将HL7的各个部分分解到他们自己的区域。 Do write a generic HL7 parser (not too difficult I've done it a couple of times) and keep it flexible. 写一个通用的HL7解析器（我已经做过几次并不太困难）并保持灵活性。 But most of all expect the HL7 to vary in structure don't treat the specification as 100% truth you will get variations. 但是大多数人都希望HL7在结构上有所不同，不要将规格视为100％的真相，你会得到变化。

In most cases it's a waste of time to try to create a normalized relational data model to persist HL7 V2 or V3 data. 在大多数情况下，尝试创建规范化的关系数据模型以保留HL7 V2或V3数据是浪费时间。 I would recommend just storing entire messages or documents as single XML column values. 我建议只将整个消息或文档存储为单个XML列值。 Then query using SQLXML and/or XQuery. 然后使用SQLXML和/或XQuery进行查询。 All modern relational databases support this now. 所有现代关系数据库现在支持这一点。

I can only comment on the CDA (and some very limited HL7v2) side of things based on personal experience. 我只能根据个人经验评论CDA（以及一些非常有限的HL7v2）方面。

We receive and send CDA documents wrapped in HL7v3 wrappers from external vendors (as well as internal systems -- see below). 我们从外部供应商（以及内部系统 - 以及下面的系统）接收和发送包含在HL7v3包装中的CDA文档。 The wrappers contain the metadata for things like sending/receiving systems/dates and other high-level data. 包装器包含发送/接收系统/日期和其他高级数据之类的元数据。 The very limited message metadata is stripped and stored in the message data repository. 非常有限的消息元数据被剥离并存储在消息数据存储库中。 Inside the wrapper, is the actual CDA, which is then taken and stored as XML datatype in the SQL database. 在包装器内部，是实际的CDA，然后将其作为XML数据类型存储在SQL数据库中。

Using this model we can then search at the metadata level, but also narrow it down based on the CDA using Xpath queries. 使用此模型，我们可以在元数据级别进行搜索，但也可以使用Xpath查询基于CDA缩小范围。 It makes the database much simpler...I can't even imagine creating columns based on the CDA schema. 它使数据库变得更加简单......我甚至无法想象基于CDA模式创建列。

As for making clients follow the CDA schema, as a part of the project we've created an implementation guide which clients must follow if they want to have their messages accepted. 至于让客户遵循CDA架构，作为项目的一部分，我们已经创建了一个实施指南，如果客户希望接受他们的消息，他们必须遵循该指南。

Using the implementation guide + schematron + BizTalk and XSD validation, we only accept messages which follow the CDA schema. 使用实施指南+ schematron + BizTalk和XSD验证，我们只接受遵循CDA架构的消息。 We then check some data fields using schematron validation and reject if any of those fail. 然后，我们使用schematron验证检查一些数据字段，如果其中任何一个失败，则拒绝。 This is relayed to the sender using an HL7v3 message back to them with the specific error message and/or fields that are invalid. 使用HL7v3消息将其转发给发件人，并使用特定错误消息和/或无效字段将其发送回发件人。 This is a point at which a message will be stored in the database. 这是消息将存储在数据库中的点。

This is all done in BizTalk/SQL Server. 这一切都在BizTalk / SQL Server中完成。 And since the CDA schema is very much pre-defined by the HL7 group, you can make the consumers of this system follow the schema. 由于CDA模式由HL7组预先定义，因此您可以使该系统的使用者遵循模式。 This is unlike what I've seen with HL7v2 where it seems people just bend the schema as needed. 这与我在HL7v2中看到的不同，人们似乎只是根据需要弯曲架构。

For the HL7v2 side of things, I'm 99% certain that "we" (as in, "my company") are storing the messages much in the same way. 对于HL7v2方面，我99％肯定“我们”（如“我的公司”）以相同的方式存储消息。 Except since since the HL7v2 schema is so open, we're not validating and just accepting/storing all messages. 除了因为HL7v2架构如此开放，我们不会验证并只是接受/存储所有消息。 An HL7v2 parser has been written to parse the HL7v2 using the variations of schemas we know about. 已编写HL7v2解析器以使用我们了解的模式的变体来解析HL7v2。

In my project's case, we are sending HL7v2 from our HCIS --> Mirth --> BizTalk which then follows the Implementation guide + CDA Schema along with an XSLT transform to map the HL7v2 to CDA THEN submits it to the OTHER BizTalk CDA Submission service as though it was an external vendor. 在我的项目中，我们从我们的HCIS发送HL7v - > Mirth - > BizTalk，然后遵循实施指南+ CDA Schema以及XSLT转换将HL7v2映射到CDA然后将其提交给OTHER BizTalk CDA提交服务好像是外部供应商。

That's a ton of reading right now, so please ask questions, as I'd like to talk about it. 这是现在的大量阅读，所以请提问，因为我想谈谈它。

Modeling on HL7 can be a pain. 在HL7上建模可能很痛苦。

I would do the following; 我会做以下事情;

use the standards described in HL7 for staging tables, that way even if you have varchar(1024) and they are null it does not hurt you 使用HL7中描述的标准用于登台表，即使你有varchar（1024）并且它们为null也不会伤害你
create your actual table to be populated from the staging table as per the standards that you have enforced or will enforce. 根据您已强制执行或将强制执行的标准，创建要从登台表填充的实际表。

This means that you have 500+ columns from the message but only 10 or 50 make sense, you will need to model only your 50. Yes, this has a lopside, tomorrow you want to make more meaning then it will increase from 50 to 75, the historical messages will not have information; 这意味着你有超过500列的消息，但只有10或50有意义，你需要只为你的50建模。是的，这有一个lopside，明天你想要更多的意义然后它会从50增加到75 ，历史信息不会有信息; which is fine but you will need to factor into the design. 这很好，但你需要考虑到设计。

I would under no circumstances attempt to model anything using the HL7 v3 RIM. 在任何情况下，我都不会尝试使用HL7 v3 RIM进行任何建模。 The reason is that this schema is very generic, deferring much of the metadata to the message itself. 原因是这种模式非常通用，将大部分元数据推迟到消息本身。 Are you familiar with an EAV table? 你熟悉EAV表吗？ The RIM is like that. RIM就是这样。

On the other hand, HL7 v2 should be a fairly simple basis for a DB schema. 另一方面，HL7 v2应该是数据库模式的一个相当简单的基础。 You can create tables around segment types, and columns around field names. 您可以围绕段类型创建表，并围绕字段名称创建列。

I think the problem of pulling in everything kills the project and you should not do it. 我认为拉入一切的问题会导致项目失败，你不应该这样做。 Typically, HL7 v2 messages carry a small subset of the whole, so it would be an utter waste to build out the whole thing, and it would be very confusing. 通常情况下，HL7 v2消息占据了整体的一小部分，因此构建整个内容将是一种完全的浪费，而且会非常混乱。

Further, the version of v2 you model would impact your schemas dramatically, with later versions, more and more fields become repeating fields, and your join relationships would change. 此外，您建模的v2版本会显着影响您的模式，对于更高版本，越来越多的字段将成为重复字段，您的连接关系将发生变化。

I recommend that you put a stake in the sand and start with v2.4 which is pretty easy yet still more complicated than most interfaces actually in use. 我建议你把赌注放在沙子里并从v2.4开始，这比实际使用的大多数界面都要简单但更复杂。 Focus on a few segments and a few fields. 专注于几个细分和几个领域。 MSH and PID first. MSH和PID优先。

Add an EAV table to capture what may come in that you don't yet have in your tables. 添加EAV表以捕获表中尚未存在的内容。 You can then look at what comes into this table over time and use it to decide what to build next. 然后，您可以查看随着时间推移进入此表的内容，并使用它来决定下一步要构建的内容。 Your EAV could look like this MSG_ID, SEGMENT, SET_ID, FIELD_NAME, FIELD VALUE. 您的EAV可能看起来像MSG_ID，SEGMENT，SET_ID，FIELD_NAME，FIELD VALUE。 Just store the unparsed HL7 contents of the field value. 只需存储字段值的未解析的HL7内容。