[英]How to read Oracle data pump binary dump file directly?
For performance and other reasons, I am looking for a way to directly parse the binary file format of a data pump dump file. 出于性能和其他原因,我正在寻找一种直接解析数据泵转储文件的二进制文件格式的方法。
The data pump utility "impdp" works only on the database server host, not on the DB client host. 数据泵实用程序“ impdp”仅在数据库服务器主机上有效,而在DB客户端主机上无效。 In order to run it you have to send the whole dump file from DB client to DB server host and then run "impdp" using SSH.
为了运行它,您必须将整个转储文件从数据库客户端发送到数据库服务器主机,然后使用SSH运行“ impdp”。
Sometimes, like if you want only to get a list of schemas or tables included in the dump file, sending a huge file to remote host is non-sense. 有时,例如,如果您只想获取转储文件中包含的架构或表的列表,则将大型文件发送到远程主机是没有意义的。
I am looking for a library (preferred in Java) or a format specification describing the dump file in order to write code to parse it locally, without the help of the official "impdp" utility. 我正在寻找一个描述转储文件的库(在Java中首选)或一种格式说明,以便编写代码以在本地解析该文件,而无需借助官方的“ impdp”实用程序。
Thanks. 谢谢。
UPDATE: 更新:
I use the following regular expression to filter the dump file to find table names: 我使用以下正则表达式来过滤转储文件以查找表名称:
^[\\x32-\\x7e\\s]{4,}.*</OWNER_NAME><NAME>([^<]*)</NAME>.*
The expression [\\\\x32-\\\\x7e\\\\s]
means printable ASCII characters including white spaces. 表达式
[\\\\x32-\\\\x7e\\\\s]
表示可打印的ASCII字符,包括空格。 This filters out the binary lines. 这将滤除二进制行。
The expression {4,}
means at least 4 characters. 表达式
{4,}
表示至少4个字符。
Since I am dealing with XML, I am extracting the "NAME" element that comes directly after "OWNER_NAME" element. 由于我正在处理XML,因此我提取了紧接在“ OWNER_NAME”元素之后的“ NAME”元素。 Maybe this way is not that elegant but it seems to work.
也许这种方式不是那么优雅,但似乎可行。
Please comment if this way helped you. 如果这种方式对您有所帮助,请发表评论。
Using Java/JDBC for huge data manipulation is not good idea. 使用Java / JDBC进行大量数据操作不是一个好主意。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.