简体   繁体   English

INSERT IGNORE在mysql模式下使用h2引发主键冲突

[英]INSERT IGNORE throws primary key violation using h2 in mysql mode

I am scraping Scopus data into an h2 file database. 我正在将Scopus数据刮到h2文件数据库中。 There are over 46,000,000 records in the data, and each is treated as distinct which means that hundreds of GB of data is repeated (hence the relational db). 数据中有超过46,000,000条记录,每条记录都被视为不同的记录,这意味着要重复数百GB的数据(因此称为关系db)。 In order to reduce the insert time of all this data, I initially create a set of temporary tables with no constraints and then copy the data into the real tables later using SELECT DISTINCT and GROUP BY to enforce uniqueness. 为了减少所有这些数据的插入时间,我最初创建了一组没有约束的临时表,然后稍后使用SELECT DISTINCT和GROUP BY强制唯一性将数据复制到实际表中。

The one exception to this is the documents table and the referenced documents table. 一个例外是文档表和引用的文档表。 Due to the format of the data, I can guarantee that each record represents a unique document, so I can just INSERT INTO the documents table, and later concat only the rows from the referenced documents table which have IDs not already in the documents table. 由于数据的格式,我可以保证每个记录都代表一个唯一的文档,因此我只需将它们插入到文档表中,然后再从引用的文档表中合并具有ID不在文档表中的行。

Here's the relevant code: 以下是相关代码:

CREATE TABLE document (docid varchar NOT NULL, title varchar, abstract varchar, docType varchar NULL, ref boolean);

CREATE TABLE refdoc (refid varchar NOT NULL, title varchar);

INSERT INTO document (docid, title, abstract, docType, ref)
VALUES ('2-s2.0-0000098715', 'title', 'abstract', 'ab', 'false');

INSERT INTO refdoc (refid, title)
VALUES ('2-s2.0-0000098715', 'title'),
VALUES ('2-s2.0-33947184743', 'title');

ALTER TABLE document
ADD PRIMARY KEY (docid);

ALTER TABLE document
ADD FOREIGN KEY (docType) REFERENCES doctype(abbrev);

INSERT IGNORE INTO document (docid, title, ref)
SELECT refid, title, 'true' FROM refdoc;

  • Create the documents table 创建文件表
  • Create the referenced documents table 创建参考文件表
  • Insert a record into the documents table 将记录插入文档表
  • Insert two records into the refdoc table, including a duplicate 将两个记录插入refdoc表中,包括重复记录
  • Alter the documents table with a primary key 用主键更改文档表
  • Alter the documents table with a foreign key 用外键更改文档表
  • Insert the rows from refdoc which do not conflict with document 插入refdoc中与文档不冲突的行

The INSERT IGNORE query throws: org.h2.jdbc.JdbcSQLException: Unique index or primary key violation: "CONSTRAINT_INDEX_6 ON PUBLIC.DOCUMENT(DOCID) INSERT IGNORE查询引发:org.h2.jdbc.JdbcSQLException:唯一索引或主键冲突:“ CONSTRAINT_INDEX_6 ON PUBLIC.DOCUMENT(DOCID)

I also tried using WHERE NOT EXISTS: 我也尝试使用WHERE NOT EXISTS:

INSERT INTO document (docid, title, ref)
SELECT refid, title, 'true'
FROM refdoc
WHERE NOT EXISTS (
SELECT refid FROM refdoc
INNER JOIN document
ON document.docid = refdoc.refid );

But it would seem that attempting to join tables that aren't indexed is effectively impossible - nothing I have attempted involving joins has worked. 但是看来,试图联接未建立索引的表实际上是不可能的-我没有尝试过涉及联接的任何工作。

As a last resort I can use a FileHashMap and just dump the contents of the refdoc table and then construct a mega-huge PreparedStatement like: 作为最后的选择,我可以使用FileHashMap并仅转储refdoc表的内容,然后构造一个大型的PreparedStatement,例如:

INSERT INTO document (docid, title, ref)
SELECT ?, ?, 'true'
WHERE NOT EXISTS (
SELECT docid FROM document
WHERE docid = ? );

But I'd obviously rather not do that since it will take forever. 但是我显然不愿意这样做,因为这将永远。

Finally found a solution that doesn't involve constructing a batch statement of 100,000,000 records. 最终找到了一个不涉及构造100,000,000条记录的批处理语句的解决方案。 The issue was that I needed to both enforce that the refdocs I was inserting into document were not already in the document table, and also that I inserted only unique rows from the refdoc table. 问题是我需要强制执行以下操作:我要插入文档的refdocs不在文档表中,而且我只插入了refdoc表中的唯一行。 All of my solutions prior to this either failed to eschew conflicts, failed to enforce uniqueness, or involved joins on tables which had no indices. 在此之前,我所有的解决方案都不能避免冲突,无法强制执行唯一性,或者涉及到对没有索引的表的联接。

Here's the solution SQL: 这是解决方案的SQL:

CREATE TABLE document (docid varchar NOT NULL, title varchar, abstract varchar, docType varchar NULL);

CREATE TABLE refdoc (refid varchar NOT NULL, title varchar);

INSERT INTO document (docid, title, abstract, docType)
VALUES ('2-s2.0-0000098715', 'title', 'abstract', 'ab');

INSERT INTO refdoc (refid, title)
VALUES ('2-s2.0-0000098715', 'title'),
VALUES ('2-s2.0-33947184743', 'title');

INSERT IGNORE INTO document (docid, title)
SELECT refid, MAX(title)
FROM refdoc
WHERE refid NOT IN (
SELECT docid FROM document )
GROUP BY refid;

ALTER TABLE document
ADD PRIMARY KEY (docid);

ALTER TABLE document
ADD FOREIGN KEY (docType) REFERENCES doctype(abbrev);

The logic is now: 现在的逻辑是:

  • Create the documents table 创建文件表
  • Create the referenced documents table 创建参考文件表
  • Insert a record into the documents table 将记录插入文档表
  • Insert two records into the refdoc table, including a duplicate 将两个记录插入refdoc表中,包括重复记录
  • Insert the rows from refdoc which do not conflict with document and are unique 插入refdoc中与文档不冲突且唯一的行
  • Alter the documents table with a primary key 用主键更改文档表
  • Alter the documents table with a foreign key 用外键更改文档表

This has the added bonus of not indexing the documents table until after the inserts are completed. 这样做还有一个好处,就是直到插入完成后才对文档表建立索引。

It's still not entirely clear to me why I was getting a primary key constraint violation on a table with no primary key, but that sounds like something to submit to the h2 github as a bug report. 对我来说,还不是很清楚为什么我在没有主键的表上遇到主键约束冲突,但这听起来像是要作为错误报告提交给h2 github的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试在 Spring 启动应用程序中将行插入 H2 db 时出现主键冲突异常 - Primary Key violation exception when trying to insert rows into H2 db in Spring boot application 带有H2数据库的JUnit:为多个数据添加多语言服务时的唯一索引或主键冲突 - JUnit with H2 Database : Unique index or primary key violation when adding multilingual services for multiple data H2 Java 插入忽略 - 允许异常 - H2 Java Insert ignore - allow exception H2数据库-在主键冲突时替换 - H2 Database - on primary key conflict replace 对于 UUID 类型的列,在插入新行时返回 H2 数据库中默认生成的主键值 - Return primary key value generated by default in H2 database upon INSERT of new row, for UUID type column 使用JAVA UUID作为主键时,H2内存数据库错误“数据转换错误转换” - H2 in-memory database error “Data conversion error converting” when using JAVA UUID as primary key Liquibase + H2 + Junit主键序列重新开始 - Liquibase + H2 + Junit Primary Key Sequence starts over 在MODE = MySQL中运行h2不支持MySQL转储 - Running h2 in MODE=MySQL doesn't support MySQL dumps Mysql SQL 查询 DATEDIFF 在 H2 中失败,其中模式为 MYSQL - Mysql SQL query DATEDIFF failed in H2 where mode was MYSQL org.h2.jdbc.JdbcSQLIntegrityConstraintViolationException:唯一索引或主键违规 - org.h2.jdbc.JdbcSQLIntegrityConstraintViolationException: Unique index or primary key violation
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM