簡體   English   中英

aws athena查詢json數組數據

[英]aws athena query json array data

我無法使用 Aws Athena 查詢 S3 文件,文件的內容是常規的 json arrays,如下所示:

[
  {
    "DataInvio": "2020-02-06T13:37:00+00:00",
    "DataLettura": "2020-02-06T13:35:50+00:00",
    "FlagDownloaded": 0,
    "GUID": "f257c9c0-b7e1-4663-8d6d-97e652b27c10",
    "IMEI": "866100000062167",
    "Id": 0,
    "IdSessione": "4bd169ff-307c-4fbf-aa63-fce972f43fa2",
    "IdTagLocal": 0,
    "SerialNumber": "142707160028BJZZZZ",
    "Tag": "E200001697080089188056D2",
    "Tipo": "B",
    "TipoEvento": "L",
    "TipoSegnalazione": 0,
    "TipoTag": "C",
    "UsrId": "10642180-1e34-44ac-952e-9cb3e8e6a03c"
  },
  {
    "DataInvio": "2020-02-06T13:37:00+00:00",
    "DataLettura": "2020-02-06T13:35:50+00:00",
    "FlagDownloaded": 0,
    "GUID": "e531272e-465c-4294-950d-95a683ff8e3b",
    "IMEI": "866100000062167",
    "Id": 0,
    "IdSessione": "4bd169ff-307c-4fbf-aa63-fce972f43fa2",
    "IdTagLocal": 0,
    "SerialNumber": "142707160028BJZZZZ",
    "Tag": "E200341201321E0000A946D2",
    "Tipo": "B",
    "TipoEvento": "L",
    "TipoSegnalazione": 0,
    "TipoTag": "C",
    "UsrId": "10642180-1e34-44ac-952e-9cb3e8e6a03c"
  }
]

如果以這種方式生成表,則select * from mytable返回空行

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable (
  `IdSessione` string,
  `DataLettura` date,
  `GUID` string,
  `DataInvio` date 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'ignore.malformed.json' = 'true'
) LOCATION 's3://athenatestsavino/files/anthea/'
TBLPROPERTIES ('has_encrypted_data'='false')

或者它給我一個錯誤HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]如果表是用以下方法生成的:

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable(
  `IdSessione` string,
  `DataLettura` date,
  `GUID` string,
  `DataInvio` date 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://athenatestsavino/files/anthea/'
TBLPROPERTIES ('has_encrypted_data'='false')

如果我以這種方式修改文件的內容(json object 每行沒有尾隨逗號,查詢會給我結果)

{ "DataInvio": "2020-02-06T13:37:00+00:00", "DataLettura": "2020-02-06T13:35:50+00:00",....}
{ "DataInvio": "2020-02-07T13:37:00+00:00", "DataLettura": "2020-02-06T13:35:50+00:00",....}
    

如何直接查詢json數組結構?

Athena Best Practices建議每行有一個 json:

確保每個 JSON 編碼的記錄都在單獨的行中表示。

這已經被問過幾次了,我認為沒有人讓它與一組 json 一起工作:

這與 JSON 對象的格式有關。 此處還描述了這些問題的解決方案: https://aws.amazon.com/premiumsupport/knowledge-center/error-json-athena/

除此之外,如果您使用 AWS Glue 來爬取這些文件,請確保 Data Catalog 的數據庫表分類不是“未知”。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM