[英]How to import Edges from CSV with ETL into OrientDB graph?
I'm trying to import edges from a CSV-file into OrientDB. 我正在尝试将边缘从CSV文件导入OrientDB。 The vertices are stored in a separate file and already imported via ETL into OrientDB.
顶点存储在单独的文件中,并且已经通过ETL导入到OrientDB中。 So my situation is similar to OrientDB import edges only using ETL tool and OrientDB ETL loading CSV with vertices in one file and edges in another .
因此,我的情况类似于仅使用ETL工具和OrientDB ETL加载CSV的 OrientDB导入边 ,其中一个文件中包含顶点,而另一个文件中包含边 。
Update 更新资料
Friend.csv Friend.csv
"id","client_id","first_name","last_name"
"0","0","John-0","Doe"
"1","1","John-1","Doe"
"2","2","John-2","Doe"
...
The "id"
field is removed by the Friend-Importer, but the "client_id"
is stored. 朋友导入器删除了
"id"
字段,但存储了"client_id"
。 The idea is to have a known client-side generated id
for searching etc. 这个想法是要有一个已知的客户端生成的
id
来进行搜索等。
PeindingFriendship.csv PeindingFriendship.csv
"friendship_id","client_id","from","to"
"0","0-1","1","0"
"2","0-15","15","0"
"3","0-16","16","0"
...
The "friendship_id"
and "client_id"
should be imported as attributes of the "PendingFriendship"
edge. 应将
"friendship_id"
和"client_id"
作为"PendingFriendship"
边缘的属性导入。 "from"
is a "client_id"
of a Friend. "from"
是朋友的"client_id"
。 "to"
is a "client_id"
of another Friend. "to"
是另一个朋友的"client_id"
。 For "client_id"
exists a unique Index on both Friend
and PendingFriendship
. 对于
"client_id"
,在Friend
和PendingFriendship
上都存在唯一的索引。
My ETL configuration looks like this 我的ETL配置如下所示
...
"extractor": {
"csv": {
}
},
"transformers": [
{
"command": {
"command": "CREATE EDGE PendingFriendship FROM (SELECT FROM Friend WHERE client_id = '${input.from}') TO (SELECT FROM Friend WHERE client_id = '${input.to}') SET client_id = '${input.client_id}'",
"output": "edge"
}
},
{
"field": {
"fieldName": "from",
"expression": "remove"
}
},
{
"field": {
"fieldName": "to",
"operation": "remove"
}
},
{
"field": {
"fieldName": "friendship_id",
"expression": "remove"
}
},
{
"field": {
"fieldName": "client_id",
"operation": "remove"
}
},
{
"field": {
"fieldName": "@class",
"value": "PendingFriendship"
}
}
],
...
The issue with this configuration is that it creates two edge entries. 此配置的问题在于它创建了两个边缘条目。 One is the expected "PendingFriendship" edge.
一种是预期的“ PendingFriendship”优势。 The second one is an empty "PendingFriendship" edge, with all the fields I removed as attributes with empty values.
第二个是空的“ PendingFriendship”边缘,我删除的所有字段均作为具有空值的属性。 The import fails, at the second row/document, because another empty "PendingFriendship" cannot be inserted because it violates a uniqueness constraint.
在第二行/文档中,导入失败,因为另一个空的“ PendingFriendship”违反了唯一性约束,因此无法插入。 How can I avoid the creation of the unnecessary empty "PendingFriendship".
我如何避免不必要的空“ PendingFriendship”的创建。 What is the best way to import edges into OrientDB?
将边导入OrientDB的最佳方法是什么? All the examples in the documentation use CSV files where vertices and edges are in one file, but this is not the case for me.
文档中的所有示例均使用CSV文件,其中顶点和边在一个文件中,但对我而言并非如此。
I also had a look into the Edge-Transformer , but it returns a Vertex not an Edge! 我还查看了Edge-Transformer ,但它返回的是Vertex而不是Edge!
After some time I found a way (workaround) to import the above data into OrientDB. 一段时间后,我找到了一种将上述数据导入OrientDB的方法(解决方法)。 Instead of using the ETL Tool I wrote simple ruby scripts which call the HTTP API of OrientDB using the Batch endpoint.
我没有使用ETL工具,而是编写了简单的ruby脚本,该脚本使用Batch端点调用OrientDB的HTTP API。
Steps: 脚步:
client_ids
to @rids
. client_ids
到@rids
的映射。 PeindingFriendship.csv
and build batch
requests. PeindingFriendship.csv
并建立batch
请求。 @rids
into the command from 4. @rids
插入到来自4.的命令中。 batch
requests in junks of 1000 commands. batch
请求。 Example Batch-Request body: 示例批处理请求正文:
{
"transaction" : true,
"operations" : [
{
"type" : "cmd",
"language" : "sql",
"command" : "create edge PendingFriendship from #27:178 to #27:179 set client_id='4711'"
}
]
}
This isn't the answer to the question I asked, but it solves the higher goal of importing data into OrientDB, for me. 这不是我提出的问题的答案,但对我来说,它解决了将数据导入OrientDB的更高目标。 Therefore I leave it open for the community to mark this question as solved or not.
因此,我让社区可以将此问题标记为已解决或未解决。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.