[英]Converting EPOCH to Date in Elasticsearch Spark
I have a DataFrame that I am writing it to the ES 我有一个将其写入ES的DataFrame
Before writing to ES, I am converting the EVTExit
column to Date, which is in EPOCH. 在写入ES之前,我将
EVTExit
列转换为EPOCH中的Date。
workset = workset.withColumn("EVTExit", to_date(from_unixtime($"EVTExit".divide(1000))))
workset.select("EVTExit").show(10)
+----------+
| EVTExit|
+----------+
|2014-06-03|
|null |
|2012-10-23|
|2014-06-03|
|2015-11-05|
As I can see this EVTExit
is converted to Date. 如我所见,此
EVTExit
转换为Date。
workset.write.format("org.elasticsearch.spark.sql").save("workset/workset1")
But after writing it to the ES, I am still getting it in EPOC format. 但是在将其写入ES之后,我仍然可以使用EPOC格式。
"EVTExit" : 1401778800000
Can anyone have the ideas what's going wrong here. 任何人都可以知道这里出了什么问题。
Thanks 谢谢
Let's consider the DataFrame
example from your question : 让我们考虑一下您的问题中的
DataFrame
示例:
scala> val df = workset.select("EVTExit")
// df: org.apache.spark.sql.DataFrame = [EVTExit: date]
scala> df.printSchema
// root
// |-- EVTExit: date (nullable = true)
You would need to cast the column into a string and disable the es.mapping.date.rich
which is true
by default. 您将需要将该列转换为字符串并禁用
es.mapping.date.rich
,默认情况下为true
。
The parameter define whether to create a rich Date like object for Date fields in Elasticsearch or returned them as primitives (String or long). 该参数定义是为Elasticsearch中的Date字段创建类似于日期的富对象还是将其作为原语(字符串或long)返回。 The actual object type is based on the library used;
实际的对象类型基于所使用的库。 noteable exception being Map/Reduce which provides no built-in Date object and as such LongWritable and Text are returned regardless of this setting.
值得注意的异常是Map / Reduce,它不提供内置的Date对象,因此无论此设置如何,都会返回LongWritable和Text。
I agree, this is counter intuitive but it's the only solution for now if you wish that elasticsearch
doesn't convert it into long
format. 我同意,这是违反直觉的,但是如果您希望
elasticsearch
不会将其转换为long
格式,则这是目前唯一的解决方案。 This is actually quite painful. 这实际上是很痛苦的。
scala> val df2 = df.withColumn("EVTExit_1", $"EVTExit".cast("string"))
// df2: org.apache.spark.sql.DataFrame = [EVTExit: date, EVTExit_1: string]
scala> df2.show
// +----------+----------+
// | EVTExit| EVTExit_1|
// +----------+----------+
// |2014-06-03|2014-06-03|
// | null| null|
// |2012-10-23|2012-10-23|
// |2014-06-03|2014-06-03|
// |2015-11-05|2015-11-05|
// +----------+----------+
Now you can write your data to elasticsearch
: 现在您可以将数据写入
elasticsearch
:
scala> df2.write.format("org.elasticsearch.spark.sql").option("es.mapping.date.rich", "false").save("workset/workset1")
Now let's check what's on ES. 现在,让我们检查一下ES上的内容。 First let's see the mapping :
首先让我们看一下映射:
$ curl -XGET localhost:9200/workset?pretty=true
{
"workset" : {
"aliases" : { },
"mappings" : {
"workset1" : {
"properties" : {
"EVTExit" : {
"type" : "long"
},
"EVTExit_1" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1475063310916",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "i3Rb014sSziCmYm9LyIc5A",
"version" : {
"created" : "2040099"
}
}
},
"warmers" : { }
}
}
It seems like we have our dates. 好像我们有约会。 Now let's check the contents :
现在让我们检查一下内容:
$ curl -XGET localhost:9200/workset/_search?pretty=true -d '{ "size" : 1 }'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
"_index" : "workset",
"_type" : "workset1",
"_id" : "AVdwn-vFWzMbysX5OjMA",
"_score" : 1.0,
"_source" : {
"EVTExit" : 1401746400000,
"EVTExit_1" : "2014-06-03"
}
} ]
}
}
Note 1: I kept both fields for the demonstration purpose but I think that you get the point. 注意1:我将两个字段都保留用于演示目的,但我认为您明白了。
Note 2: Tested with Elasticsearch 2.4, Spark 1.6.2, scala 2.10 and elasticsearch-spark 2.3.2 inside spark-shell
注意2:在
spark-shell
内对Elasticsearch 2.4,Spark 1.6.2,scala 2.10和elasticsearch-spark 2.3.2 spark-shell
$ spark-shell --master local[*] --packages org.elasticsearch:elasticsearch-spark_2.10:2.3.2
Note 3: Same solution in with pyspark
: 注意3:与
pyspark
相同的解决方案:
from pyspark.sql.functions import col
df2 = df.withColumn("EVTExit_1",col("EVTExit").cast("string"))
df2.write.format("org.elasticsearch.spark.sql") \
.option("es.mapping.date.rich", "false").save("workset/workset1")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.