[英]Bulk import of .json files in arangodb with python
I have huge collection of.json files containing hundreds or thousands of documents I want to import to arangodb collections.我收集了大量的.json 文件,其中包含我想导入到 arangodb collections 的数百或数千个文档。 Can I do it using python and if the answer is yes, can anyone send an example on how to do it from a list of files?
我可以使用 python 来做到这一点,如果答案是肯定的,任何人都可以从文件列表中发送一个关于如何做到这一点的示例吗? ie:
IE:
for i in filelist:
import i to collection
I have read the documentation but I couldn't find anything even resembling that我已经阅读了文档,但我找不到任何类似的东西
So after a lot of trial and error I found out that I had the answer in front of me.因此,经过多次反复试验,我发现答案摆在我面前。 So I didn't need to import the.json file, I just needed to read it and then do a bulk import of documents.
所以我不需要导入.json文件,我只需要阅读它然后批量导入文件。 The code is like this:
代码是这样的:
a = db.collection('collection_name')
for x in list_of_json_files:
with open(x,'r') as json_file:
data = json.load(json_file)
a.import_bulk(data)
So actually it was quite simple.所以实际上这很简单。 In my implementation I am collecting the.json files from multiple folders and importing them to multiple collections.
在我的实现中,我从多个文件夹收集.json 文件并将它们导入到多个 collections。 I am using the python-arango 5.4.0 driver
我正在使用 python-arango 5.4.0 驱动程序
I had this same problem.我有同样的问题。 Though your implementation will be slightly different, the answer you need (maybe not the one you're looking for) is to use the "bulk import" functionality.
尽管您的实现会略有不同,但您需要的答案(可能不是您要寻找的答案)是使用“批量导入”功能。
Since ArangoDB doesn't have an "official" Python driver (that I know of), you will have to peruse other sources to give you a good idea on how to solve this.由于 ArangoDB 没有“官方”Python 驱动程序(据我所知),因此您必须仔细阅读其他资源,以便对如何解决这个问题有一个好主意。
The HTTP bulk import/export docs provide curl
commands, which can be neatly translated to Python web requests. HTTP 批量导入/导出文档提供
curl
命令,可以巧妙地翻译为 Python Z218967A5EC33DEB064 请求。 Also see the section on headers and values .另请参阅标题和值部分。
ArangoJS has a bulk import function, which works with an array of objects, so there's no special processing or preparation required. ArangoJS 有一个批量导入function,它适用于对象数组,因此不需要特殊处理或准备。
I have also used the arangoimport tool to great effect.我还使用了arangoimport工具,效果很好。 It's command-line, so it could be controlled from Python, or used stand-alone in a script.
它是命令行的,因此可以从 Python 控制,或在脚本中单独使用。 For me, the key here was making sure my data was in JSONL or "JSON Lines" format (each line of the file is a self-contained JSON object, no bounding array or comma separators).
对我来说,这里的关键是确保我的数据采用JSONL或“JSON Lines”格式(文件的每一行都是自包含的 JSON object,没有边界数组或逗号分隔符)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.