简体   繁体   中英

Spring Batch: migrating 1 to n relationship where n is potentially huge

I am experienced with Spring, but new to Spring Batch. Now I have the task to migrate a data structure from a simple structure in one database to a complexer one in the other. The data structure corresponds to an object hierarchy that I will name like this

OldParent 1 --> n OldChild // old system

NewParent 1 --> n NewChild // new system

In the old db, there are only two tables, in the new system, things get a lot more complex and there are 8 tables, but that is irrelevant for now.

Basically I would like to use a simple JDBC-based solution with rowmappers reading from OldParent and converting to NewParent.

So here would be a basic configuration snippet:

<batch:job id="migration">
    <batch:step id="convertLegacyData">
        <batch:tasklet>
            <batch:chunk
                reader="parentReader"
                writer="parentWriter"
                commit-interval="200" />
        </batch:tasklet>
    </batch:step>
</batch:job>

In this scenario, the parentReader would acquire and convert the OldChild objects, probably delegating to a childReader / childWriter objects.

The problem is this: while there are several hundred thousand Parents, each Parent can have zero to several million children, so the commit-interval based on parent would not help at all, but I would very much like to have a configurable commit interval.

So another solution would be to make the workflow child-based:

<batch:job id="migration">
    <batch:step id="convertLegacyData">
        <batch:tasklet>
            <batch:chunk
                reader="childReader"
                writer="childWriter"
                commit-interval="200" />
        </batch:tasklet>
    </batch:step>
</batch:job>

In this scenario, the childReader would have to also read OldParent objects and write NewParents, delegating to parentReader and parentWriter objects. The major drawback here is that I am losing all OldParents that don't have associated OldChild objects.

The third possible scenario would be to have two different workflows for OldParent -> NewParent and OldChild -> NewChild . (I would have to maintain a mapping table that stores the relationship between OldParent and NewParent ids, but I could use standard configurations including commit-interval.

Are there other possibilities? Which of these would you recommend as best practice?

Doesn't it have a N-records commit-interval configuration? Doesn't it uses something like BatchUpdates (JDBC) so you can configure N-sized batch updates and a commit for each batchupdate.

If it doesn't I have a hack :)

Make your own java.sql.Connection implementation. One that passes all the commands to the original connection and plus, executes a commit after each N-th update... :)

If you're using a DatabasePool you can wrap the original too, to return a wrapped connection with the hack.

I know it's a little weird proposition... but maybe it's all you need for a one-time migration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM