简体   繁体   English

将批量数据导入ArangoDB的最佳方法

[英]Best way to import bulk data into ArangoDB

I'm currently working on an ArangoDB POC. 我目前正在研究ArangoDB POC。 I find that the time taken for document creation is very high in ArangoDB with PyArango. 我发现使用PyArango在ArangoDB中创建文档所需的时间非常长。 It takes about 5 minutes to insert 300 documents. 插入300个文档大约需要5分钟。 I've pasted the rough code below, please let me know if there are better ways to speed this up : 我已粘贴下面的粗略代码,如果有更好的方法可以加快速度,请告诉我:

with open('abc.csv') as fp:
for line in fp:
    dataList = line.split(",")

    aaa = dbObj['aaa'].createDocument()
    bbb = dbObj['bbb'].createDocument() 
    ccc = dbObj['ccc'].createEdge()

    bbb['bbb'] = dataList[1]
    aaa['aaa'] = dataList[0]
    aaa._key = dataList[0]

    aaa.save()
    bbb.save()

    ccc.links(aaa,bbb)
    ccc['related_to'] = "gfdgf"
    ccc['weight'] = 0

    ccc.save()

The different collections are created by the below code : 不同的集合由以下代码创建:

 dbObj.createCollection(className='aaa', waitForSync=False)

for your problem with the batch mode in the arango java driver. 针对arango java驱动程序中批处理模式的问题。 if you know the key attributes of the vertices you can build the document handle by "collectionName" + "/" + "documentKey". 如果您知道顶点的关键属性,则可以通过“collectionName”+“/”+“documentKey”构建文档句柄。 Example: 例:

arangoDriver.startBatchMode();

for(String line : lines)
{
  String[] data = line.split(",");

  BaseDocument device = new BaseDocument();
  BaseDocument phyAddress = new BaseDocument(); 
  BaseDocument conn = new BaseDocument();

  String keyDevice = data[0];
  String handleDevice = "DeviceId/" + keyDevice; 

  device.setDocumentKey(keyDevice);

  device.addAttribute("device_id",data[0]);

  String keyPhyAddress = data[1];
  String handlePhyAddress = "PhysicalLocation/" + keyPhyAddress; 

  phyAddress.setDocumentKey(keyPhyAddress);

  phyAddress.addAttribute("address",data[1]);

  final DocumentEntity<BaseDocument> from = arangoDriver.graphCreateVertex("testGraph", "DeviceId", device, null);       
  final DocumentEntity<BaseDocument> to = arangoDriver.graphCreateVertex("testGraph", "PhysicalLocation", phyAddress, null);

  arangoDriver.graphCreateEdge("testGraph", "DeviceId_PhysicalLocation", null, handleDevice, handlePhyAddress, null, null);

}
arangoDriver.executeBatch();

我将构建要插入到json格式的字符串中的所有数据,并使用createDocumentRaw一次创建它们,只需一次保存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM