简体   繁体   中英

Most efficient method to import bulk JSON data from different sources in postgresql?

I need to import data from thousands of URLs, here is an example of the data:

[{"date":"20201006T120000Z","uri":"secret","val":"1765.756"},{"date":"20201006T120500Z","uri":"secret","val":"2015.09258"},{"date":"20201006T121000Z","uri":"secret","val":"2283.0885"}]

Since COPY doesn't support JSON format, i've been using this to import the data from some of the URLs:

CREATE TEMP TABLE stage(x jsonb);

COPY stage FROM PROGRAM 'curl https://.....';

insert into test_table select f.* from stage,
   jsonb_populate_recordset(null::test_table, x) f;

But it is inefficient since it creates a table for every import and it imports a single url at a time. I would like to know if it is possible (through a tool, script or command) to read a file with all the URLs and copy their data into the database.

With your example data, all you would have to do is remove the first character of the first line, and the last printable character (either , or ] ) of every line, and then it would be compatible with COPY. It is possible for there to be JSON which would break that (either due to formatting or due to content), but then they would also break your example alternative code as well. If your example code does work, then perhaps you will never have such problematic data/formatting, or perhaps you just haven't run into it yet.

You could either add a processing step to remove those nuisance characters, or you could change the way you fetch the data in bulk (which you didn't describe) to avoid outputting them in the first place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM