簡體   English   中英

如何將 json 文件復制到 postgres 中?

[英]How do I COPY IMPORT a json file into postgres?

我想將 json 數據導入 postgres。 我擁有的數據大約為一百萬行,大小至少為 700 MB,並且一直延伸到 3 GB。

這是我根據我擁有的數據結構創建的示例數據。 我嘗試將其導入 postgres,但出現錯誤。

樣本(1) 數據

{"offers":{"offer":[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]}}

我使用的命令和我得到的錯誤

# copy contrial from '/home/ubuntu/sample-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: {"offers":{"offer":[
COPY contrial, line 1, column info: "{"offers":{"offer":["

我修改了文件以刪除前兩個鍵,並且只有一個如下所示的 json 列表,但我仍然收到錯誤消息。

樣本(2) 數據

[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]

錯誤

# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: [
COPY contrial, line 1, column info: "["

Sample(3) 數據我進一步修改

[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]

不同的錯誤

# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  Token "side" is invalid.
CONTEXT:  JSON data, line 1: ...,"value":"some1 text value"}, "quotes": "5"  side...
COPY contrial, line 1, column info: "[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5"  si..."

創建表語句

CREATE TABLE public.contrial (
    info json NOT NULL
);

最終目標是創建一個表,其中鍵作為列,值作為記錄。 嵌套鍵需要展平。

+-------------------------+-----------+------------------+----------+
| url                     | nested_id | nested_value     | quotes   |
+-------------------------+-----------+------------------+----------+
| https://some1-value.com | 4         | some1 text value | 5\" side |
+-------------------------+-----------+------------------+----------+
| https://some2-value.com | 5         | some2 text value | 6\" side |
+-------------------------+-----------+------------------+----------+
| https://some3-value.com | 6         | some3 text value | 7\" side |
+-------------------------+-----------+------------------+----------+

我最終使用了Andre Dunstan 的博客這個 SO 答案,它說以特定方式格式化 json 以使用復制命令。

由於我的結構是為我正在解析的文件定義的,所以我最終得到了以下腳本。

def file_len(fname):
    # to find the number of lines in the file.
    # Has been pretty efficient even for millions of records
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

LEN = file_len('/path/to/input.json')
with open('/path/to/output.json.csv', 'w') as fo:
    with open('23819_part003.json', 'r') as fi:
        for i, l in enumerate(fi):
            # I skip the first line
            if i == 0: continue 
            
            # To remove the ']}}' from the end
            elif i+1 == LEN: _ = fo.write(l[:-3])
            
            # To remove the ',' from the end 
            # and add \n since write does not add newline on its own
            else: _ = fo.write(l[:-2]+'\n') 

# load statement

import sqlalchemy
POSTGRESQL = f'postgresql+psycopg2://{USERNAME}:{PASSWORD}@{HOSTNAME}/{DB}'
engine = sqlalchemy.create_engine(POSTGRESQL, echo=True)
            
con = engine.connect()
trans = con.begin()
LOAD_SQL = "COPY tablename from '/path/to/output.json.csv' with csv delimiter E'\x01' quote E'\x02' null as '';"
try:
    con.execute(LOAD_SQL)
    trans.commit()
except Exception as e:
    trans.rollback()
finally:
    con.close()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM