繁体   English   中英

如何使用OrientDB ETL仅创建边

[英]How to use OrientDB ETL to create edges only

我有两个CSV文件:

首先包含以下格式的~500M记录

ID,名称
汤姆用户10000023432
13943423235,Blah Person

其次以下列格式包含约1.5B的朋友关系

fromId,风湿
10000023432,13943423235

我使用OrientDB ETL工具从第一个CSV文件创建顶点。 现在,我只需要创建边缘以建立它们之间的友谊连接。

到目前为止,我已经尝试过ETL json文件的多个配置,最新的是这个:

{
    "config": {"parallel": true},
    "source": { "file": { "path": "path_to_file" } },
    "extractor": { "csv": {} },
    "transformers": [
        { "vertex": {"class": "Person", "skipDuplicates": true} },
        { "edge": { "class": "FriendsWith",
                    "joinFieldName": "from",
                    "lookup": "Person.id",
                    "unresolvedLinkAction": "SKIP",
                    "targetVertexFields":{
                        "id": "${input.to}"
                    },
                    "direction": "out"
                  }
        },
        { "code": { "language": "Javascript",
                    "code": "print('Current record: ' + record);  record;"}
        }
    ],
    "loader": {
        "orientdb": {
            "dbURL": "remote:<DB connection string>",
            "dbType": "graph",
            "classes": [
                {"name": "FriendsWith", "extends": "E"}
            ], "indexes": [
                {"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
            ]
        }
    }
}

但不幸的是,除了创建边缘之外,这还会创建具有“from”和“to”属性的顶点。

当我尝试删除顶点变换器时,ETL进程会抛出一个错误:

Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
        at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
        at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
        at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
        ... 2 more

我在这里错过了什么?

您可以使用这些ETL变换器导入边:

"transformers": [
    { "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
    { "vertex": {"class": "Person", "skipDuplicates": true} },
    { "edge": { "class": "FriendsWith",
                "joinFieldName": "toId",
                "lookup": "Person.id",
                "direction": "out"
              }
    },
    { "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

“merge”转换器将加入当前的csv行与相关的Person记录(这有点奇怪,但由于某种原因,这需要将from与源人员联系起来)。

“field”转换器将删除合并部分添加的csv字段。 您可以尝试导入而不使用“现场”变换器来查看差异。

使用Java API,您可以读取csv然后创建边缘

        String nomeYourDb = "nomeYourDb";
        OServerAdmin serverAdmin;
        try {
            serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
            if (serverAdmin.existsDatabase()) {
                OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
                String csvFile = "path_to_file";
                BufferedReader br = null;
                String line = "";
                String cvsSplitBy = "   ";   // your separator
                try {
                    br = new BufferedReader(new FileReader(csvFile));
                    int index=0;
                    while ((line = br.readLine()) != null) {
                        if(index==0){
                            index=1;
                        }
                        else{
                            String[] ids = line.split(cvsSplitBy);
                            String personFrom="(select from Person where id='"+ids[0]+"')";
                            String personTo="(select from Person where id='"+ids[1]+"')";
                            String query="create edge FriendsWith from "+personFrom+" to "+personTo;
                            g.command(new OCommandSQL(query)).execute();
                        }
                    }
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                } catch (IOException e) {
                    e.printStackTrace();
                }
                finally {
                if (br != null) {
                        br.close();
                }
            }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM