简体   繁体   English

使用OrientDB ETL将简单的csv文件导入图形的最简单方法

[英]Easiest way to import a simple csv file to a graph with OrientDB ETL

I would like to import a very simple directed graph file in csv to OrientDB. 我想在csv中将一个非常简单的有向图文件导入到OrientDB中。 Concretely, the file is the roadNet-PA dataset from the SNAP collection https://snap.stanford.edu/data/roadNet-PA.html . 具体地说,该文件是来自SNAP集合https://snap.stanford.edu/data/roadNet-PA.html的roadNet-PA数据集。 The first lines of the file are as follows: 该文件的第一行如下:

# Directed graph (each unordered pair of nodes is saved once)
# Pennsylvania road network
# Nodes: 1088092 Edges: 3083796
# FromNodeId    ToNodeId
0       1
0       6309
0       6353
1       0
6353    0
6353    6354

There is only one type of vertex (a road intersection) and edges have no information (I suppose OrientDB lightweight edges are the best option for this). 只有一种类型的顶点(道路交叉点),边缘没有信息(我认为OrientDB轻量级边缘是最佳选择)。 Note also that vertices are spaced with tabs. 另请注意,顶点与制表符间隔开。

I've tried to create a simple etl to import the file with no success. 我试图创建一个简单的etl来导入文件但没有成功。 Here is the etl: 这是etl:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roadNet-PA.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "   ", "skipFrom": 1, "skipTo": 4 } },
    { "vertex": { "class": "Intersection" } },
    { "edge": { "class": "Road" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

The etl works but it does not import the file as I expect. etl可以工作,但它不会像我期望的那样导入文件。 I suppose the problem is in the transformers. 我想问题出现在变形金刚中。 My idea is to read the csv line by line and create and edge connecting both vertices, but I'm not sure how to express this in an etl file. 我的想法是逐行读取csv并创建和边连接两个顶点,但我不知道如何在etl文件中表达它。 Any ideas? 有任何想法吗?

Try this: 尝试这个:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roadNet-PA.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "\t", "skipFrom": 1, "skipTo": 4,
               "columnsOnFirstLine": false, 
               "columns":["id", "to"] } },
    { "vertex": { "class": "Intersection" } },
    { "merge": { "joinFieldName":"id", "lookup":"Intersection.id" } },
    { "edge": {
       "class": "Road",
       "joinFieldName": "to",
       "lookup": "Intersection.id",
       "unresolvedLinkAction": "CREATE"
      }
    },
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "wal": false,
       "batchCommit": 1000,
       "tx": true,
       "txUseLog": false,
       "useLightweightEdges" : true,
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

To speedup loading I suggest you to shutdown the server, and import the ETL by using "plocal:" instead of "remote:". 为了加速加载,我建议您关闭服务器,并使用“plocal:”而不是“remote:”导入ETL。 Example replacing the existent with: 替换现有的示例:

       "dbURL": "plocal:/orientdb/databases/roads",

It finally worked. 它终于奏效了。 I've moved the merge before vertex line as suggested by Luca. 我按照Luca的建议在顶点线之前移动了合并。 I've also changed the 'id' field to 'from' to avoid the error "property key is reserved for all elements id". 我还将'id'字段更改为'from'以避免错误“属性键保留给所有元素id”。 Here is the snippet: 这是片段:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roads.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "\t",
               "columnsOnFirstLine": false, 
               "columns":["from", "to"] } },
    { "merge": { "joinFieldName":"from", "lookup":"Intersection.from" } },
    { "vertex": { "class": "Intersection" } },
    { "edge": {
       "class": "Road",
       "joinFieldName": "to",
       "lookup": "Intersection.from",
       "unresolvedLinkAction": "CREATE"
      }
    },
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "wal": false,
       "batchCommit": 1000,
       "tx": true,
       "txUseLog": false,
       "useLightweightEdges" : true,
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["from:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM