[英]What's the effective way to insert more a million rows into postgresql server from another postgres server using Java?
I have two postgresql servers and I need to copy table rows with from first server format and convert to another server format (different column names).我有两个 postgresql 服务器,我需要从第一个服务器格式复制表行并转换为另一种服务器格式(不同的列名)。
I use java application with spring boot and jpa repository, which implemented method findAll with stream read fetch size 1000.我使用 java 应用程序和 spring 引导和 jpa 存储库,它实现了方法 findAll stream 读取提取大小 1000。
@Query("select c from ExternalFormatEntity c")
@QueryHints(@javax.persistence.QueryHint(name = "org.hibernate.fetchSize",
value = Constants.DEFAULT_FETCH_SIZE))
Stream<ExternalFormatEntity> findAllEntities();
After reading I convert and insert 1000 rows in batch.阅读后我批量转换并插入 1000 行。
try (Stream<ExternalFormatEntity> allExtEntitiesStream = extFormatService.getAllEntities()) {
LinkedList<CanonicalFormatEntity> canonicalEntityList = new LinkedList<>();
allExtEntitiesStream.forEach(extEntity -> {
if (Objects.nonNull(extEntity)) {
canonicalEntityList.add(SomeConverter.convert(extEntity));
}
if (canonicalEntityList.size() >= DEFAULT_BATCH_SIZE) {
List<CanonicalFormatEntity> copyList = new LinkedList<>(canonicalEntityList);
canonicalEntityList.clear();
Thread thread = new Thread(() -> {
canonicalEntityRepository.saveAll(copyList);
canonicalEntityRepository.flush();
copyList.clear();
});
thread.start();
}
});
}
For my opinion, current speed of this operation can be faster than 1 hour for 1 million records.在我看来,对于 100 万条记录,此操作的当前速度可以快于 1 小时。 Can I speed up this operation, if yes, how to do it?我可以加快这个操作吗,如果可以,该怎么做?
Foremost, I tried to convert table records from first database to CSV file, save it on another server and use Postgres Copy Api for downloading but the summary time is still unacceptable due to additional operations with the hard disk.首先,我尝试将第一个数据库的表记录转换为CSV文件,保存在另一台服务器上,使用Postgres Copy Api下载,但由于额外操作硬盘,汇总时间仍然无法接受。
Maybe postgres have stream writing or something else?也许 postgres 有 stream 写作或其他东西? I cant find answer in official postgresql docs.我无法在官方 postgresql 文档中找到答案。
For my case helped next solution:对于我的案例帮助下一个解决方案:
export external table to csv file with zip compression (example from StackOverflow answer: https://stackoverflow.com/a/3981807/3744622 )使用 zip 压缩将外部表导出到 csv 文件(来自 StackOverflow 答案的示例: https://stackoverflow.com/a/3981807/3744622 )
copy small zip file to postgres server in /tmp folder scp root@ext_server:/path/to/file root@target_server:/tmp/
将小 zip 文件复制到 /tmp 文件夹中的 postgres 服务器scp root@ext_server:/path/to/file root@target_server:/tmp/
import table from csv zipped file (example from StackOverflow answer: https://stackoverflow.com/a/46228247/3744622 )从 csv 压缩文件导入表(来自 StackOverflow 答案的示例: https://stackoverflow.com/a/46228247/3744622 )
I achieved summary time about 10 minutes.我实现了大概10分钟的总结时间。
Thank you all, this is wonderful place)谢谢大家,这是个好地方)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.