简体   繁体   English

无法在Elasticsearch-hadoop中使用SchemaRDD.saveToES()从HDFS索引JSON

[英]Unable to index JSON from HDFS using SchemaRDD.saveToES() in Elasticsearch-hadoop

This is my first real attempt at spark/scala so be gentle. 这是我第一次真正尝试火花/ scala,所以要保持温柔。

I have a file called test.json on HDFS that I'm trying to read and index using Spark. 我在HDFS上有一个名为test.json的文件,我正在尝试使用Spark进行读取和编制索引。 I'm able to read the file via SQLContext.jsonFile() but when I try to use SchemaRDD.saveToEs() I get an invalid JSON fragment received error. 我可以通过SQLContext.jsonFile()读取文件,但是当我尝试使用SchemaRDD.saveToEs()时,收到无效的JSON片段错误。 I'm thinking that the saveToES() function isn't actually formatting the output in json and instead is just sending the value field of the RDD. 我在想,saveToES()函数实际上并没有格式化json中的输出,而只是发送了RDD的value字段。

What am I doing wrong? 我究竟做错了什么?

Spark 1.2.0 火花1.2.0

Elasticsearch-hadoop 2.1.0.BUILD-20150217 Elasticsearch-hadoop 2.1.0.BUILD-20150217

test.json: test.json:

{"key":"value"}

spark-shell: 火花壳:

import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val input = sqlContext.jsonFile("hdfs://nameservice1/user/mshirley/test.json")
input.saveToEs("mshirley_spark_test/test")

error: 错误:

<snip>
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - Invalid JSON fragment received[["value"]][MapperParsingException[failed to parse]; nested: ElasticsearchParseException[Failed to derive xcontent from (offset=13, length=9): [123, 34, 105, 110, 100, 101, 120, 34, 58, 123, 125, 125, 10, 91, 34, 118, 97, 108, 117, 101, 34, 93, 10]]; ]]; Bailing out..
<snip>

input: 输入:

res2: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[6] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
PhysicalRDD [key#0], MappedRDD[5] at map at JsonRDD.scala:47

input.printSchema(): input.printSchema():

root
 |-- key: string (nullable = true)

https://github.com/elastic/elasticsearch-hadoop/issues/382 https://github.com/elastic/elasticsearch-hadoop/issues/382

changed: 已更改:

import org.elasticsearch.spark._

to: 至:

import org.elasticsearch.spark.sql._

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM