[英]Unable to index JSON from HDFS using SchemaRDD.saveToES() in Elasticsearch-hadoop
This is my first real attempt at spark/scala so be gentle. 这是我第一次真正尝试火花/ scala,所以要保持温柔。
I have a file called test.json on HDFS that I'm trying to read and index using Spark. 我在HDFS上有一个名为test.json的文件,我正在尝试使用Spark进行读取和编制索引。 I'm able to read the file via SQLContext.jsonFile() but when I try to use SchemaRDD.saveToEs() I get an invalid JSON fragment received error.
我可以通过SQLContext.jsonFile()读取文件,但是当我尝试使用SchemaRDD.saveToEs()时,收到无效的JSON片段错误。 I'm thinking that the saveToES() function isn't actually formatting the output in json and instead is just sending the value field of the RDD.
我在想,saveToES()函数实际上并没有格式化json中的输出,而只是发送了RDD的value字段。
What am I doing wrong? 我究竟做错了什么?
Spark 1.2.0 火花1.2.0
Elasticsearch-hadoop 2.1.0.BUILD-20150217 Elasticsearch-hadoop 2.1.0.BUILD-20150217
test.json: test.json:
{"key":"value"}
spark-shell: 火花壳:
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val input = sqlContext.jsonFile("hdfs://nameservice1/user/mshirley/test.json")
input.saveToEs("mshirley_spark_test/test")
error: 错误:
<snip>
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - Invalid JSON fragment received[["value"]][MapperParsingException[failed to parse]; nested: ElasticsearchParseException[Failed to derive xcontent from (offset=13, length=9): [123, 34, 105, 110, 100, 101, 120, 34, 58, 123, 125, 125, 10, 91, 34, 118, 97, 108, 117, 101, 34, 93, 10]]; ]]; Bailing out..
<snip>
input: 输入:
res2: org.apache.spark.sql.SchemaRDD =
SchemaRDD[6] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
PhysicalRDD [key#0], MappedRDD[5] at map at JsonRDD.scala:47
input.printSchema(): input.printSchema():
root
|-- key: string (nullable = true)
https://github.com/elastic/elasticsearch-hadoop/issues/382 https://github.com/elastic/elasticsearch-hadoop/issues/382
changed: 已更改:
import org.elasticsearch.spark._
to: 至:
import org.elasticsearch.spark.sql._
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.