简体   繁体   English

德鲁伊批量索引inputSpec类型粒度,错误“作业中没有指定输入路径”

[英]Druid batch indexing inputSpec type granularity, error with “no input paths specified in job”

I'm following the instruction written here: http://druid.io/docs/0.9.2/ingestion/batch-ingestion.html (scroll to "InputSpec specification", look for "granularity"). 我按照这里写的说明: http ://druid.io/docs/0.9.2/ingestion/batch-ingestion.html(滚动到“InputSpec规范”,查找“粒度”)。

I have in in my indexing-task JSON: 我有我的索引任务JSON:

"inputSpec": {
  "type": "granularity",
  "dataGranularity": "DAY",
  "inputPath": "hdfs://hadoop:9000/druid/events/interview",
  "filePattern": ".*",
  "pathFormat": "'y'=yyyy/'m'=MM/'d'=dd"
} 

I already have my files organized in HDFS like this (I did it on purpose, thinking that I would be using "granularity" type in my indexing task): 我已经将我的文件组织在这样的HDFS中(我是故意这样做的,以为我会在我的索引任务中使用“粒度”类型):

在此输入图像描述

I keep getting this error (failure in indexing): 我一直收到这个错误(索引失败):

Caused by: java.io.IOException: No input paths specified in job
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) ~[?:?]
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340) ~[?:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) ~[?:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) ~[?:?]
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) ~[?:?]
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) ~[?:?] 

Googled it, got two pages talking about the same problem: 谷歌搜索它,有两页谈论同样的问题:

Both mentioned setting the value of "filePattern" to ".*". 两者都提到将“filePattern”的值设置为“。*”。 Did that, no luck. 那样做,没有运气。

To confirm tha my Druid-Hadoop link works, I tried changing my inputSpec to static: 为了确认我的Druid-Hadoop链接有效,我尝试将inputSpec更改为static:

"inputSpec": {
  "type": "static",
  "paths": "hdfs://hadoop:9000/druid/events/interview/y=2016/m=11/d=06/event.json,hdfs://hadoop:9000/druid/events/interview/y=2016/m=11/d=07/event.json"
}

It works. 有用。 So, no problem with my Druid and Hadoop. 所以,我的德鲁伊和Hadoop没问题。

Is this "granularity" inputSpec broken in Druid (I'm using 0.9.2)? 这个“粒度”输入规格是否在德鲁伊中被打破(我使用的是0.9.2)? Because I don't see anything wrong in my inputSpec (the granularity type one); 因为我在inputSpec中没有看到任何错误(粒度类型为1); at least not according to the doc and forum I read. 至少不是根据我读过的文档和论坛。

In the meantime I can use the static one (and build my lengthy paths string), but that "granularity" type would be ideal (if only it worked). 在此期间,我可以使用静态路径(并构建我的冗长路径字符串),但“粒度”类型将是理想的(如果只有它工作)。

Can anyone shed some light here? 谁能在这里解决一些问题?

Thanks. 谢谢。

尝试在路径模式的末尾添加一个/:“pathFormat”:“'y'= yyyy /'m'= MM /'d'= dd /”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Hadoop:AWS EMR作业中的输入和输出路径 - Hadoop: Input and Output paths in AWS EMR job 德鲁伊 hadoop 批处理主管:无法将类型 id 'index.hadoop' 解析为 SupervisorSpec 的子项 - Druid hadoop batch supervisor: Could not resolve type id 'index.hadoop' as a subtipe of SupervisorSpec 坚果1.10作业失败,错误请求错误索引到solr 5.3.1 - nutch 1.10 job failed, bad request error indexing to solr 5.3.1 Druid + Hadoop(两种用途,深度存储和索引编制) - Druid + Hadoop (for both uses, deep-store & indexing) Oozie 作业错误 - java.io.IOException:未指定配置 - Oozie Job Error - java.io.IOException: configuration is not specified Druid / Hadoop批处理索引/ Map Reduce / YARN /无远程,仅本地 - Druid / Hadoop batch index / Map Reduce / YARN / No remote, just local 为 Druid 构建 Docker 镜像时出错 - Error while building Docker image for Druid 从 Hive 到 Druid 交互时出错 - Error when interacting from Hive to Druid 使用 Hadoop s3a 从 spark 作业写入 s3 时,指定的加密方法不受支持错误 - The encryption method specified is not supported error when writing to s3 from spark job using Hadoop s3a Druid spatialDimensions 在 Hadoop 摄取期间加载数据错误 - Druid spatialDimensions loading data error during Hadoop Ingestion
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM