简体   繁体   中英

Large JSON data import into PostgreSQL

the NOOB Developer is back with yet another question. I'm working on importing a large dataset into a PostgreSQL 9.5 database. I originally started out using PHP, but once I tried to load the entire 14mb file, it failed. I went on to increase the memory limit within the script, but that didn't help. I thought about using a parsing library, but decided that since I'm using PostgreSQL 9.5, I should just leverage the database instead. My JSON file has repeatable fields, so I could not use JSONB and went with using the plain JSON import. Unfortunately, that worked until I tried to load the entire file. I then got the following error:

ERROR:  invalid input syntax for type json
DETAIL:  Expected JSON value, but found "]".
CONTEXT:  JSON data, line 1: ...:"Commercial & Industrial","u_cost_code":""},"",]

Here is an example of the JSON file content:

Array
(
    [result] => Array
        (
            [0] => Array
                (
                    [field1] => 0
                    [fiedl2] => 
                    [field3] => 1900-04-19 19:14:10
                    [field4] => false
                    [field5] => XXX.XXXXX.XXX.XXX.XXX
                    [field6] => ldap:CN=XXXX XXXXXXX,OU=XXXXX,OU=XXXXX,OU=XXX,DC=XXXXXX,DC=XXXX,DC=XXXX
                    [field7] => 1900-07-18 17:45:08
                    [field8] => true
                    [field9] => 
                    [field10] => false
                    [field11] => 2
                    [field12] => 30406
                    [field13] => T
                    [field14] => 00000000000000000
                    [field15] => 1900-01-19 21:33:07
                    [field16] => Array
                        (
                            [link] => https://mozilla.com
                            [value] => mozilla
                        )

                    [field17] => 1601-01-01 06:00:00
                    [field18] => 
                    [field19] => false
                    [field20] => 01001
                    [field21] => 

                )           
        )
)

Here is the statement I'm using to create my table, this allowed me to import the entire file 14mb without an issue:

CREATE TABLE temp_json 
(
     ID SERIAL NOT NULL PRIMARY KEY
    ,TIMESTAMP TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    ,VALUES TEXT
);

I started following the example of this developer hoping to resolve this issue: how-to-get-json-data-type-into-postgresql

Here is the fairly standard copy command I'm using to import the data into the table:

copy temp_json(values) from 'C:\path\to\my\json_file.json';

I then went on to use the following sql statement in an attempt to move the data into a relational table that I found here, loading-json-data-from-a-file-into-postgres , on stack. I did this in the effort of finding an easier way to move my data set into the table. Here is the sql statement I am trying to get working:

insert into table_to_hold_json 
select 
values::json->'result'->'calendar_integration' as calendar_integration,
values::json->'result'->'country' as country,
values::json->'result'->'last_login_time' as last_login_time,
values::json->'result'->'u_gartner_acct' as u_gartner_acct,
values::json->'result'->'u_dept_name' as u_dept_name,
values::json->'result'->'source' as source,
values::json->'result'->'sys_updated_on' as sys_updated_on,
values::json->'result'->'u_field_user' as u_field_user
from ( select json_array_elements(replace(values,'\','\\')::json) as values 
from temp_json ) a;

However, I'm now getting the same error as I did on the import to the temp_json table. I also tried to escape the '\\' with the copy command using:

csv quote e'\x01' delimiter e'\x02'

Unfortunately, I still end up with the same error when I try to query the JSON data. So, now I'm banging my head against the wall in trying to sort out how to escape that darn ']'. Any assistance that is given will be greatly appreciated!

Okay, so I went back and worked out how to break up my file download from the data provider. Now that I'm keeping the data set under the specified timeout period I can use PHP or whatever else I want to parse the data. This is a good reminder to always check your logs or data sets, closely. :-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM