简体   繁体   中英

Testing SQL queries against multiple database systems

I'm involved in a migration project from Oracle to PostgreSQL, and I'm looking for a way to automate the testing of a large number of queries converted from Oracle syntax into the PostgreSQL one. The assumption is that the data has been migrated successfully, so there is no need to check that. I can hack a solution from scratch using Perl or Python, but there might be easier ways. I was looking at the database testing frameworks, lke Test::DBUnut or pgTap, but they assume that a user supplies results to verify against, and in my case these are obtained from the database we are migrating from. A question is, is there an existing database-specific tool or testing framework to execute queries against old (Oracle) and new (PostgreSQL) databases, get the results and compare them, highlighting the differences and any errors that might occur in the process?

How about creating JUnit project that runs the corresponding query on different schemas (one Oracle the other PostgreSQL)?

Alternatively, you could create two simple Maven projects (one per each vendor) each project will use an SQL Plugin in order to run your queries (paste them in the same order into the pom.xml). You can later automate these tests by using continuous integration server that supports Maven (Hudson?) and set a scheduled execution.

Good luck!

I ended up writing a custom tool to run queries against both databases and collect results using python psycopg2 and cx_oracle. Comparing them is a matter of calculating hashes for each row and checking whether the oracle row exists in the hash of the postgresql rows . A couple of pitfalls:

  • floating point numbers can loose precision when converted from Oracle/PostgreSQL to python. Use type specific hooks in the drivers (see documentation) to make sure you convert them to Decimal, not float.

  • it's tempting to just read one row at a time from both databases, compare its values and move on. However, that won't work, unless the SQL result is explicitly ordered (with ORDER BY). Unfortunately, reading the results all at once means that you need a lot of memory for queries producing lots of rows.

  • one needs to distinguish between queries producing equal results and those producing 0 rows on both databases. The latter should be examined and if the queries contain parameters, their values should be revised.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM