簡體   English   中英

無法從 Hive 外部表上的 Druid 數據源查詢數據

[英]Can't query data from Druid Datasource on a Hive External table

Druid 集群和 Hive/Hadoop 集群單獨運行正常。 我們正在 Hive 中創建一個表,用於從 Druid 讀取數據(用於 ETL),但是,在初始測試中,我們發現我們無法從中執行簡單的SELECT * ,然后出現以下錯誤:

hive> select * from druid_hive_table;
OK
druid_hive_table.__time druid_hive_table.op_ts  druid_hive_table.op_type    druid_hive_table.pos    druid_hive_table.table
Failed with exception java.io.IOException:org.apache.hive.druid.com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of java.util.ArrayList out of START_OBJECT token
 at [Source: org.apache.hive.druid.com.metamx.http.client.io.AppendableByteArrayInputStream@656c5818; line: -1, column: 4]
Time taken: 0.449 seconds

但是, SELECT COUNT(*)工作正常!

hive> select count(*) from druid_hive_table;
OK
$f0
21409
Time taken: 0.199 seconds, Fetched: 1 row(s)

眼鏡:
德魯伊外桌

SET hive.druid.broker.address.default=<host>:8082;

CREATE EXTERNAL TABLE druid_hive_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_datasource_name");

hive> DESCRIBE FORMATTED druid_hive_table;
OK
col_name    data_type   comment
# col_name              data_type               comment             

__time                  timestamp               from deserializer   
op_ts                   string                  from deserializer   
op_type                 string                  from deserializer   
pos                     string                  from deserializer   
table                   string                  from deserializer   

# Detailed Table Information         
Database:               tests                    
Owner:                  OWNER                   
CreateTime:             Mon Feb 10 13:52:13 UTC 2020     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               <LOCATION>    
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
    EXTERNAL                TRUE                
    druid.datasource        druid_datasource_name          
    numFiles                0                   
    numRows                 0                   
    rawDataSize             0                   
    storage_handler         org.apache.hadoop.hive.druid.DruidStorageHandler
    totalSize               0                   
    transient_lastDdlTime   1581342733          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.druid.serde.DruidSerDe    
InputFormat:            null                     
OutputFormat:           null                     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    serialization.format    1                   
Time taken: 0.144 seconds, Fetched: 37 row(s)

供參考 - Druid Supervisor Spec:

{
  "dataSchema": {
    "dataSource": "druid_datasource_name",
    "timestampSpec": {
      "column": "current_ts",
      "format": "iso",
      "missingValue": null
    },
    "dimensionsSpec": {
      "dimensions": [],
      "dimensionExclusions": [
        "current_ts"
      ]
    },
    "metricsSpec": [],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": {
        "type": "none"
      },
      "rollup": false,
      "intervals": null
    },
    "transformSpec": {
      "filter": null,
      "transforms": []
    }
  },
  "ioConfig": {
    "topic": "<kafka_topic>",
    "inputFormat": {
      "type": "json",
      "flattenSpec": {
        "useFieldDiscovery": true,
        "fields": []
      },
      "featureSpec": {}
    },
    "replicas": 1,
    "taskCount": 1,
    "taskDuration": "PT3600S",
    "consumerProperties": {
      "bootstrap.servers": "<bootstrap_servers>",
      "group.id": "<group_name>",
      "security.protocol": "SASL_SSL",
      "ssl.truststore.location": "<location>",
      "ssl.truststore.password": "<pass>",
      "sasl.jaas.config": "<config>",
      "sasl.mechanism": "SCRAM-SHA-512"
    },
    "pollTimeout": 100,
    "startDelay": "PT5S",
    "period": "PT30S",
    "useEarliestOffset": true,
    "completionTimeout": "PT1800S",
    "lateMessageRejectionPeriod": null,
    "earlyMessageRejectionPeriod": null,
    "lateMessageRejectionStartDateTime": null,
    "stream": "<kafka_topic>",
    "useEarliestSequenceNumber": true,
    "type": "kafka"
  },
  "tuningConfig": {
    "type": "kafka",
    "maxRowsInMemory": 1000000,
    "maxBytesInMemory": 0,
    "maxRowsPerSegment": 5000000,
    "maxTotalRows": null,
    "intermediatePersistPeriod": "PT10M",
    "basePersistDirectory": "/opt/apache-druid-0.17.0/var/tmp/druid-realtime-persist7801461398656096281",
    "maxPendingPersists": 0,
    "indexSpec": {
      "bitmap": {
        "type": "concise"
      },
      "dimensionCompression": "lz4",
      "metricCompression": "lz4",
      "longEncoding": "longs"
    },
    "indexSpecForIntermediatePersists": {
      "bitmap": {
        "type": "concise"
      },
      "dimensionCompression": "lz4",
      "metricCompression": "lz4",
      "longEncoding": "longs"
    },
    "buildV9Directly": true,
    "reportParseExceptions": false,
    "handoffConditionTimeout": 0,
    "resetOffsetAutomatically": false,
    "segmentWriteOutMediumFactory": null,
    "workerThreads": null,
    "chatThreads": null,
    "chatRetries": 8,
    "httpTimeout": "PT10S",
    "shutdownTimeout": "PT80S",
    "offsetFetchPeriod": "PT30S",
    "intermediateHandoffPeriod": "P2147483647D",
    "logParseExceptions": false,
    "maxParseExceptions": 2147483647,
    "maxSavedParseExceptions": 0,
    "skipSequenceNumberAvailabilityCheck": false,
    "repartitionTransitionDuration": "PT120S"
  },
  "type": "kafka"
}

感謝您幫助解決此問題。

我已經設法解決了這個問題。 將 Hive 和 Hadoop 更新到版本 3+ 解決了該問題。

使用以下代碼就像在一塊面包上塗黃油一樣簡單:

SET hive.druid.broker.address.default=<host>:8082;

CREATE EXTERNAL TABLE druid_hive_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_datasource_name");

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM