[英]How to use OrientDB ETL to create edges only
我有两个CSV文件:
首先包含以下格式的~500M记录
ID,名称
汤姆用户10000023432
13943423235,Blah Person
其次以下列格式包含约1.5B的朋友关系
fromId,风湿
10000023432,13943423235
我使用OrientDB ETL工具从第一个CSV文件创建顶点。 现在,我只需要创建边缘以建立它们之间的友谊连接。
到目前为止,我已经尝试过ETL json文件的多个配置,最新的是这个:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}
但不幸的是,除了创建边缘之外,这还会创建具有“from”和“to”属性的顶点。
当我尝试删除顶点变换器时,ETL进程会抛出一个错误:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more
我在这里错过了什么?
您可以使用这些ETL变换器导入边:
"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]
“merge”转换器将加入当前的csv行与相关的Person记录(这有点奇怪,但由于某种原因,这需要将from与源人员联系起来)。
“field”转换器将删除合并部分添加的csv字段。 您可以尝试导入而不使用“现场”变换器来查看差异。
使用Java API,您可以读取csv然后创建边缘
String nomeYourDb = "nomeYourDb";
OServerAdmin serverAdmin;
try {
serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
if (serverAdmin.existsDatabase()) {
OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
String csvFile = "path_to_file";
BufferedReader br = null;
String line = "";
String cvsSplitBy = " "; // your separator
try {
br = new BufferedReader(new FileReader(csvFile));
int index=0;
while ((line = br.readLine()) != null) {
if(index==0){
index=1;
}
else{
String[] ids = line.split(cvsSplitBy);
String personFrom="(select from Person where id='"+ids[0]+"')";
String personTo="(select from Person where id='"+ids[1]+"')";
String query="create edge FriendsWith from "+personFrom+" to "+personTo;
g.command(new OCommandSQL(query)).execute();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
if (br != null) {
br.close();
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.