简体   繁体   English

分块XML并将其加载到关系表中

[英]chunking XML and loading it into relational tables

I work for a credit union (roughly 60K accounts). 我为一个信用合作社工作(大约6万个帐户)。 The statement process is from the '70s and it tightly coupled the data to the layout. 声明过程是从70年代开始的,它将数据紧密地耦合到布局中。 In short, you run a job and it produces a text file that contains the statement for each account. 简而言之,您运行一个作业,它会生成一个文本文件,其中包含每个帐户的对帐单。 I've modified the mainframe config and now instead of getting text out, I get XML like so: 我已经修改了大型机的配置,现在我得到的不是XML,而是这样的:

<statements>
    <statement account='1'>
       ...statement info like checking/savings/certificate/visa/loan/heloc shares
    </statement>
    <statement account='N'>
       ...statement info like checking/savings/certificate/visa/loan/heloc shares
    </statement>
</statements>

I wrote java code to pull data from relational table(s) and build PDFs on the fly with iText. 我编写了Java代码以从关系表中提取数据,并使用iText快速构建PDF。 Some of the data that shows up on the statement is calculated from data in the XML. 语句中显示的某些数据是根据XML中的数据计算得出的。 For instance, the XML contains all the transactions on the share. 例如,XML包含共享中的所有事务。 On the statement, we want to show the number of credits and the number of debits. 在报表中,我们要显示贷方数和借方数。 Once loaded into a DB, I can use a view to calculate these values on the fly and provide the data to my java app. 一旦加载到数据库中,我就可以使用视图即时计算这些值,并将数据提供给我的Java应用程序。

this XML file is ~900MB and only going to grow as we add more members. 该XML文件约为900MB,并且随着我们添加更多成员而不断增长。

I want to process the xml one "statement" at a time. 我想一次处理一个“声明”。 http://mrico.eu/entry/parsing_chunks_of_xml_documents http://mrico.eu/entry/parsing_chunks_of_xml_documents

Can JAXB parse large XML files in chunks ) JAXB可以分块地解析大型XML文件

Once I have an individual statement, I want to load it's shares (checking, savings, visa, etc) into corresponding DB tables. 一旦有了个人声明,我想将其份额(支票,储蓄,签证等)加载到相应的DB表中。

Seems like the simplest way to accomplish this is to bind the statement to a POJO and then for each complex element (share or transaction or loan) in the POJO, do an insert. 似乎最简单的方法是将语句绑定到POJO,然后对POJO中的每个复杂元素(股份,交易或贷款)进行插入。

what combination of parser / binder / persistence tools would you guys recommend? 你们会推荐解析器/绑定器/持久性工具的什么组合?

personally, I'd favor raw JDBC inserts, so the question of parser and binder is more important. 就个人而言,我更喜欢原始JDBC插入,因此解析器和活页夹的问题更为重要。

Note: I could probably create a schema for the XML, but it might be fragile due to the way the mainframe builds the XML file. 注意:我可能可以为XML创建一个架构,但是由于大型机构建XML文件的方式,它可能很脆弱。 Anyone using Fiserv's Spectrum software feels my pain. 使用Fiserv Spectrum软件的任何人都会感到痛苦。

看一下StAX ,它是XML的流API。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM