简体   繁体   English

使用 python 在 arangodb 中批量导入 .json 文件

[英]Bulk import of .json files in arangodb with python

I have huge collection of.json files containing hundreds or thousands of documents I want to import to arangodb collections.我收集了大量的.json 文件,其中包含我想导入到 arangodb collections 的数百或数千个文档。 Can I do it using python and if the answer is yes, can anyone send an example on how to do it from a list of files?我可以使用 python 来做到这一点,如果答案是肯定的,任何人都可以从文件列表中发送一个关于如何做到这一点的示例吗? ie: IE:

for i in filelist:
    import i to collection

I have read the documentation but I couldn't find anything even resembling that我已经阅读了文档,但我找不到任何类似的东西

So after a lot of trial and error I found out that I had the answer in front of me.因此,经过多次反复试验,我发现答案摆在我面前。 So I didn't need to import the.json file, I just needed to read it and then do a bulk import of documents.所以我不需要导入.json文件,我只需要阅读它然后批量导入文件。 The code is like this:代码是这样的:

a = db.collection('collection_name')
for x in list_of_json_files:
    with open(x,'r') as json_file:
        data = json.load(json_file)
        a.import_bulk(data)

So actually it was quite simple.所以实际上这很简单。 In my implementation I am collecting the.json files from multiple folders and importing them to multiple collections.在我的实现中,我从多个文件夹收集.json 文件并将它们导入到多个 collections。 I am using the python-arango 5.4.0 driver我正在使用 python-arango 5.4.0 驱动程序

I had this same problem.我有同样的问题。 Though your implementation will be slightly different, the answer you need (maybe not the one you're looking for) is to use the "bulk import" functionality.尽管您的实现会略有不同,但您需要的答案(可能不是您要寻找的答案)是使用“批量导入”功能。

Since ArangoDB doesn't have an "official" Python driver (that I know of), you will have to peruse other sources to give you a good idea on how to solve this.由于 ArangoDB 没有“官方”Python 驱动程序(据我所知),因此您必须仔细阅读其他资源,以便对如何解决这个问题有一个好主意。

  • The HTTP bulk import/export docs provide curl commands, which can be neatly translated to Python web requests. HTTP 批量导入/导出文档提供curl命令,可以巧妙地翻译为 Python Z218967A5EC33DEB064 请求。 Also see the section on headers and values .另请参阅标题和值部分。

  • ArangoJS has a bulk import function, which works with an array of objects, so there's no special processing or preparation required. ArangoJS 有一个批量导入function,它适用于对象数组,因此不需要特殊处理或准备。

I have also used the arangoimport tool to great effect.我还使用了arangoimport工具,效果很好。 It's command-line, so it could be controlled from Python, or used stand-alone in a script.它是命令行的,因此可以从 Python 控制,或在脚本中单独使用。 For me, the key here was making sure my data was in JSONL or "JSON Lines" format (each line of the file is a self-contained JSON object, no bounding array or comma separators).对我来说,这里的关键是确保我的数据采用JSONL或“JSON Lines”格式(文件的每一行都是自包含的 JSON object,没有边界数组或逗号分隔符)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM