简体   繁体   English

从 postgresql 中的不同来源导入批量 JSON 数据的最有效方法?

[英]Most efficient method to import bulk JSON data from different sources in postgresql?

I need to import data from thousands of URLs, here is an example of the data:我需要从数千个 URL 导入数据,以下是数据示例:

[{"date":"20201006T120000Z","uri":"secret","val":"1765.756"},{"date":"20201006T120500Z","uri":"secret","val":"2015.09258"},{"date":"20201006T121000Z","uri":"secret","val":"2283.0885"}] [{"date":"20201006T120000Z","uri":"secret","val":"1765.756"},{"date":"20201006T120500Z","uri":"secret","val":"2015.09258 "},{"date":"20201006T121000Z","uri":"secret","val":"2283.0885"}]

Since COPY doesn't support JSON format, i've been using this to import the data from some of the URLs:由于 COPY 不支持 JSON 格式,我一直使用它从一些 URL 导入数据:

CREATE TEMP TABLE stage(x jsonb);

COPY stage FROM PROGRAM 'curl https://.....';

insert into test_table select f.* from stage,
   jsonb_populate_recordset(null::test_table, x) f;

But it is inefficient since it creates a table for every import and it imports a single url at a time.但它效率低下,因为它为每次导入创建一个表,并且一次导入一个 url。 I would like to know if it is possible (through a tool, script or command) to read a file with all the URLs and copy their data into the database.我想知道是否可以(通过工具、脚本或命令)读取包含所有 URL 的文件并将其数据复制到数据库中。

With your example data, all you would have to do is remove the first character of the first line, and the last printable character (either , or ] ) of every line, and then it would be compatible with COPY.使用您的示例数据,您所要做的就是删除第一行的第一个字符,以及每行的最后一个可打印字符( 或],然后它将与 COPY 兼容。 It is possible for there to be JSON which would break that (either due to formatting or due to content), but then they would also break your example alternative code as well.可能存在 JSON 会破坏它(由于格式或内容),但它们也会破坏您的示例替代代码。 If your example code does work, then perhaps you will never have such problematic data/formatting, or perhaps you just haven't run into it yet.如果您的示例代码确实有效,那么也许您永远不会遇到这样有问题的数据/格式,或者您可能还没有遇到它。

You could either add a processing step to remove those nuisance characters, or you could change the way you fetch the data in bulk (which you didn't describe) to avoid outputting them in the first place.您可以添加一个处理步骤来删除那些讨厌的字符,或者您可以更改批量获取数据的方式(您没有描述)以避免首先输出它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM