[英]How to use sql.Date in Spark Java Map Function with for SparkSQL
I try to read a flat file (csv) that contains also Date values besides Strings and Integers. 我尝试读取一个平面文件(csv),该文件中除了字符串和整数外还包含日期值。 So instead of just using long/String for the Date fields would be nice to get an object that contains Date values. 因此,获得一个包含Date值的对象不是很好,而不仅仅是在Date字段中使用long / String。
The code that I have is the following: 我的代码如下:
JavaRDD<Date> dates = sc.textFile("hdfs://0.0.0.0:19000/Dates.csv").map(
new Function<String, Date>(){
@Override
public Date call(String line){
String[] fields = line.split(",");
return Date.valueOf(fields[2]);
}
});
DataFrame schemaTransactions = sqlContext.createDataFrame(dates, Date.class);
schemaTransactions.registerTempTable("dates");
DataFrame dAs = sqlContext.sql("SELECT * FROM dates");
Row[] dARows = dAs.collect();
The code compiles but when it is executed, the error message 代码可以编译,但是在执行时会出现错误信息
Exception in thread "main" java.lang.ClassCastException: org.apache.spark.sql.types.DateType$ cannot be cast to org.apache.spark.sql.types.StructType
is thrown which is confusing because the documentation sais java.sql.Date to be supported https://spark.apache.org/docs/latest/sql-programming-guide.html 引发混乱,因为文档说要支持java.sql.Date https://spark.apache.org/docs/latest/sql-programming-guide.html
The same error occurs when I use sql.Timestamp. 当我使用sql.Timestamp时,会发生相同的错误。
However, my initial goal was to use LocalDateTime from Java8, but since this is not supported I tried to use sql.Date. 但是,我的最初目标是使用Java8中的LocalDateTime,但由于不支持此功能,因此我尝试使用sql.Date。
Any suggestions or is it a bug? 有什么建议还是一个错误?
Ok I just figuered out that if we place the Date object in a Wrapper class it seems to Work. 好的,我只是弄清楚了,如果我们将Date对象放置在Wrapper类中,则它似乎可以工作。
Here is the code: First we define our "Wrapper" 这是代码:首先,我们定义“包装器”
public class TestClass implements Serializable {
Date date;
public Date getDate() {
return date;
}
public void setDate(Date date) {
this.date = date;
}
}
And then change the Type "Date" to the Wrapper class. 然后将类型“日期”更改为Wrapper类。
JavaRDD<TestClass> dates = sc.textFile("hdfs://0.0.0.0:19000/Dates.csv").map(
new Function<String, TestClass>(){
@Override
public TestClass call(String line){
String[] fields = line.split(",");
TestClass tc = new TestClass();
tc.setDate(Date.parse(fields[2]));
return tc;
}
});
DataFrame schemaTransactions = sqlContext.createDataFrame(dates, TestClass.class);
schemaTransactions.registerTempTable("dates");
DataFrame dAs = sqlContext.sql("SELECT * FROM dates");
dAs.count();
Maybe this is helpful for someone... 也许这对某人有帮助...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.