简体   繁体   English

大型JSON数据导入PostgreSQL

[英]Large JSON data import into PostgreSQL

the NOOB Developer is back with yet another question. NOOB开发人员又回来了另一个问题。 I'm working on importing a large dataset into a PostgreSQL 9.5 database. 我正在将大型数据集导入PostgreSQL 9.5数据库。 I originally started out using PHP, but once I tried to load the entire 14mb file, it failed. 我最初开始使用PHP,但是一旦尝试加载整个14mb文件,它就会失败。 I went on to increase the memory limit within the script, but that didn't help. 我继续增加脚本中的内存限制,但这没有帮助。 I thought about using a parsing library, but decided that since I'm using PostgreSQL 9.5, I should just leverage the database instead. 我考虑过使用解析库,但决定由于使用PostgreSQL 9.5,所以我应该只使用数据库。 My JSON file has repeatable fields, so I could not use JSONB and went with using the plain JSON import. 我的JSON文件具有可重复的字段,因此我无法使用JSONB并继续使用纯JSON导入。 Unfortunately, that worked until I tried to load the entire file. 不幸的是,在我尝试加载整个文件之前,这种方法一直有效。 I then got the following error: 然后出现以下错误:

ERROR:  invalid input syntax for type json
DETAIL:  Expected JSON value, but found "]".
CONTEXT:  JSON data, line 1: ...:"Commercial & Industrial","u_cost_code":""},"",]

Here is an example of the JSON file content: 这是JSON文件内容的示例:

Array
(
    [result] => Array
        (
            [0] => Array
                (
                    [field1] => 0
                    [fiedl2] => 
                    [field3] => 1900-04-19 19:14:10
                    [field4] => false
                    [field5] => XXX.XXXXX.XXX.XXX.XXX
                    [field6] => ldap:CN=XXXX XXXXXXX,OU=XXXXX,OU=XXXXX,OU=XXX,DC=XXXXXX,DC=XXXX,DC=XXXX
                    [field7] => 1900-07-18 17:45:08
                    [field8] => true
                    [field9] => 
                    [field10] => false
                    [field11] => 2
                    [field12] => 30406
                    [field13] => T
                    [field14] => 00000000000000000
                    [field15] => 1900-01-19 21:33:07
                    [field16] => Array
                        (
                            [link] => https://mozilla.com
                            [value] => mozilla
                        )

                    [field17] => 1601-01-01 06:00:00
                    [field18] => 
                    [field19] => false
                    [field20] => 01001
                    [field21] => 

                )           
        )
)

Here is the statement I'm using to create my table, this allowed me to import the entire file 14mb without an issue: 这是我用来创建表的语句,这使我可以毫无问题地导入整个文件14mb:

CREATE TABLE temp_json 
(
     ID SERIAL NOT NULL PRIMARY KEY
    ,TIMESTAMP TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    ,VALUES TEXT
);

I started following the example of this developer hoping to resolve this issue: how-to-get-json-data-type-into-postgresql 我开始遵循该开发人员的示例,希望解决此问题: how-to-get-json-data-type-in​​to-postgresql

Here is the fairly standard copy command I'm using to import the data into the table: 这是我用来将数据导入表中的相当标准的复制命令:

copy temp_json(values) from 'C:\path\to\my\json_file.json';

I then went on to use the following sql statement in an attempt to move the data into a relational table that I found here, loading-json-data-from-a-file-into-postgres , on stack. 然后,我继续使用以下sql语句,以尝试将数据移动到在这里找到的关系表中, 将json-data-from-a-file-into-postgres加载到堆栈中。 I did this in the effort of finding an easier way to move my data set into the table. 我这样做是为了寻找一种将数据集移动到表中的简便方法。 Here is the sql statement I am trying to get working: 这是我要开始工作的sql语句:

insert into table_to_hold_json 
select 
values::json->'result'->'calendar_integration' as calendar_integration,
values::json->'result'->'country' as country,
values::json->'result'->'last_login_time' as last_login_time,
values::json->'result'->'u_gartner_acct' as u_gartner_acct,
values::json->'result'->'u_dept_name' as u_dept_name,
values::json->'result'->'source' as source,
values::json->'result'->'sys_updated_on' as sys_updated_on,
values::json->'result'->'u_field_user' as u_field_user
from ( select json_array_elements(replace(values,'\','\\')::json) as values 
from temp_json ) a;

However, I'm now getting the same error as I did on the import to the temp_json table. 但是,我现在遇到的错误与导入temp_json表时发生的错误相同。 I also tried to escape the '\\' with the copy command using: 我还尝试使用以下命令使用copy命令对'\\'进行转义:

csv quote e'\x01' delimiter e'\x02'

Unfortunately, I still end up with the same error when I try to query the JSON data. 不幸的是,当我尝试查询JSON数据时,仍然会遇到相同的错误。 So, now I'm banging my head against the wall in trying to sort out how to escape that darn ']'. 因此,现在我将头撞在墙上,试图找出如何逃脱该死的']'。 Any assistance that is given will be greatly appreciated! 我们将不胜感激!

Okay, so I went back and worked out how to break up my file download from the data provider. 好的,我回过头来研究如何分解从数据提供者处下载的文件。 Now that I'm keeping the data set under the specified timeout period I can use PHP or whatever else I want to parse the data. 现在,我将数据集保持在指定的超时期限内,可以使用PHP或其他任何我想解析数据的方式。 This is a good reminder to always check your logs or data sets, closely. 这是一个很好的提醒,要始终仔细检查日志或数据集。 :-) :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM