[英]OrientDB ETL, create edge between two vertices which are already in Graph
I am trying to create an edge between two vertices which are already part of OreintDB. 我试图在两个已经成为OreintDB一部分的顶点之间创建一条边。 My edge data is in a MySQL table.
我的边缘数据在MySQL表中。
Here is my oetl json. 这是我的oetl json。
{
"config": {
"log": "info"
},
"source": { "file": { "path": "/Users/RP/user_invited_data.csv" } },
"extractor": { "csv": {"columnsOnFirstLine": true, "columns":["user_id:string", "invited_by:string", "invited_date:datetime"] } },
"transformers" : [
{ "vertex": { "class": "User", "skipDuplicates": true} },
{ "edge": { "class": "INVITED", "direction" : "in",
"joinFieldName": "invited_by",
"lookup":"select expand(u) from (match {class: User, as: u} return u) where u.user_id = ?;",
"unresolvedLinkAction":"NOTHING",
"edgeFields": { "invited_date": "${input.invited_date}" },
"skipDuplicates": true
}
},
{ "field":
{ "fieldNames":
[ "invited_by", "invited_date"],
"operation": "remove"
}
}
],
"loader" : {
"orientdb": {
"dbURL": "remote:localhost/abcd_graph",
"dbUser": "root",
"dbPassword": "root",
"dbType": "graph",
"dbAutoCreate": false,
"batchCommit": 1000
}
}
}
When I run the above json, it is throwing ORecordDuplicatedException
for the User vertex. 当我运行上面的json时,它为用户顶点抛出
ORecordDuplicatedException
。 I have a unique index created on user_id
and have the skipDuplicates = true
. 我在
user_id
创建了一个唯一索引,并具有skipDuplicates = true
。 Any suggestions would be greatly appreciated. 任何建议将不胜感激。
UPDATE: Gem of OrientDB, skipDuplicates
actually works when your log
level is not DEBUG
. 更新: OrientDB的宝石,当您的
log
级别不是 DEBUG
时, skipDuplicates
实际上可以工作。 But the problem is not solved yet. 但是问题还没有解决。 No errors now but the edges are not created.
现在没有错误,但是没有创建边缘。 I will keep debugging it and see if I can fix it tonight.
我将继续调试它,看看我今晚是否可以修复它。
UPDATE After debugging a bit deeper, I got an exception deeper at the storage level. 更新在进行了更深入的调试之后,我在存储级别上获得了一个更深入的异常。
com.orientechnologies.orient.core.exception.ODatabaseException: Impossible to serialize invalid link #-1:-1
DB name="abcd_graph"
at com.orientechnologies.orient.core.serialization.serializer.record.binary.ORecordSerializerBinaryV0.writeOptimizedLink(ORecordSerializerBinaryV0.java:867)
at com.orientechnologies.orient.core.serialization.serializer.record.binary.ORecordSerializerBinaryV0.serializeValue(ORecordSerializerBinaryV0.java:754)
at com.orientechnologies.orient.core.serialization.serializer.record.binary.ORecordSerializerBinaryV0.serialize(ORecordSerializerBinaryV0.java:385)
at com.orientechnologies.orient.core.serialization.serializer.record.binary.ORecordSerializerBinary.toStream(ORecordSerializerBinary.java:99)
at com.orientechnologies.orient.core.record.impl.ODocument.toStream(ODocument.java:2381)
at com.orientechnologies.orient.core.record.impl.ODocument.toStream(ODocument.java:664)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeSaveRecord(ODatabaseDocumentTx.java:2183)
at com.orientechnologies.orient.core.tx.OTransactionNoTx.saveRecord(OTransactionNoTx.java:191)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2758)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:102)
at com.orientechnologies.orient.core.record.impl.ODocument.save(ODocument.java:1805)
at com.orientechnologies.orient.core.record.impl.ODocument.save(ODocument.java:1801)
at com.tinkerpop.blueprints.impls.orient.OrientGraphNoTx.addEdgeInternal(OrientGraphNoTx.java:242)
at com.tinkerpop.blueprints.impls.orient.OrientGraphNoTx.addEdgeInternal(OrientGraphNoTx.java:137)
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:741)
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:688)
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.createEdge(OEdgeTransformer.java:203)
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:123)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:39)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:110)
at com.orientechnologies.orient.etl.OETLProcessor$OETLPipelineWorker.call(OETLProcessor.java:620)
at com.orientechnologies.orient.etl.OETLProcessor$OETLPipelineWorker.call(OETLProcessor.java:601)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
UPDATE I have changed the extractor from DB to CSV, so that it will be easier to reproduce. 更新我已将提取器从数据库更改为CSV,以便更轻松地进行复制。
Create Schema: 创建架构:
CREATE class User IF NOT EXISTS extends V;
create property User.user_id IF NOT EXISTS String;
create property User.name IF NOT EXISTS String;
create index user_idx on User(user_id) unique;
insert into User set user_id = '1000_USER1', name = 'Bob';
insert into User set user_id = '1001_USER2', name = 'Robert';
Sample CSV: CSV范例:
user_id, ivited_by, invited_date
1001_USER2, 1000_USER1,
After some struggle and reread the whole ETL documentation and some debugging, I figured it out. 经过一番努力,重新阅读了整个ETL文档并进行了一些调试,我弄清楚了。
We need to use MERG
transformer instead of VERTEX
. 我们需要使用
MERG
变压器代替VERTEX
。 Merge transformer will lookup for a Vertex
instead of creating it. 合并转换器将查找一个
Vertex
而不是创建它。
Here is my json looks like 这是我的json看起来像
"transformers" : [
{ "merge": { "joinFieldName": "user_id", "lookup": "User.user_id" } },
{ "edge": { "class": "INVITED", "direction" : "out",
"joinFieldName": "invited_by",
"lookup": "SELECT expand(u) from (match {class: User, as: u} return u) where u.user_id = ?",
"unresolvedLinkAction":"NOTHING",
"edgeFields": { "invited_date": "${input.invited_date}" },
"skipDuplicates": true
}
},
{ "field":
{ "fieldNames":
[ "invited_by", "invited_date"],
"operation": "remove"
}
}
]
I still have one another problem, but I will take it as a separate thing and research about it. 我还有另一个问题,但是我将把它作为一个单独的东西进行研究。 The problem is it is creating duplicate edges between the same two vertices
问题是它在相同的两个顶点之间创建重复的边
I will tackle it as a separate problem. 我将作为一个单独的问题来解决。
One thing I observed all the times with OrientDB is, the stuff is there, but it is hard to figure it out. 我一直使用OrientDB观察到的一件事是,那里有东西,但是很难弄清楚。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.