简体   繁体   English

如何在Spark Java Map Function中将sql.Date与SparkSQL一起使用

[英]How to use sql.Date in Spark Java Map Function with for SparkSQL

I try to read a flat file (csv) that contains also Date values besides Strings and Integers. 我尝试读取一个平面文件(csv),该文件中除了字符串和整数外还包含日期值。 So instead of just using long/String for the Date fields would be nice to get an object that contains Date values. 因此,获得一个包含Date值的对象不是很好,而不仅仅是在Date字段中使用long / String。

The code that I have is the following: 我的代码如下:

JavaRDD<Date> dates = sc.textFile("hdfs://0.0.0.0:19000/Dates.csv").map(
    new Function<String, Date>(){
        @Override
        public Date call(String line){
            String[] fields = line.split(",");
            return Date.valueOf(fields[2]);
        }
});

DataFrame  schemaTransactions = sqlContext.createDataFrame(dates, Date.class);
schemaTransactions.registerTempTable("dates");
DataFrame dAs = sqlContext.sql("SELECT * FROM dates");
Row[] dARows = dAs.collect();

The code compiles but when it is executed, the error message 代码可以编译,但是在执行时会出现错误信息

Exception in thread "main" java.lang.ClassCastException: org.apache.spark.sql.types.DateType$ cannot be cast to org.apache.spark.sql.types.StructType

is thrown which is confusing because the documentation sais java.sql.Date to be supported https://spark.apache.org/docs/latest/sql-programming-guide.html 引发混乱,因为文档说要支持java.sql.Date https://spark.apache.org/docs/latest/sql-programming-guide.html

The same error occurs when I use sql.Timestamp. 当我使用sql.Timestamp时,会发生相同的错误。

However, my initial goal was to use LocalDateTime from Java8, but since this is not supported I tried to use sql.Date. 但是,我的最初目标是使用Java8中的LocalDateTime,但由于不支持此功能,因此我尝试使用sql.Date。

Any suggestions or is it a bug? 有什么建议还是一个错误?

Ok I just figuered out that if we place the Date object in a Wrapper class it seems to Work. 好的,我只是弄清楚了,如果我们将Date对象放置在Wrapper类中,则它似乎可以工作。

Here is the code: First we define our "Wrapper" 这是代码:首先,我们定义“包装器”

public class TestClass implements Serializable {

    Date date;

    public Date getDate() {
        return date;
    }

    public void setDate(Date date) {
        this.date = date;
    }

}

And then change the Type "Date" to the Wrapper class. 然后将类型“日期”更改为Wrapper类。

JavaRDD<TestClass> dates = sc.textFile("hdfs://0.0.0.0:19000/Dates.csv").map(
new Function<String, TestClass>(){
    @Override
    public TestClass call(String line){
        String[] fields = line.split(",");
        TestClass tc = new TestClass();
        tc.setDate(Date.parse(fields[2]));
        return tc;
    }
});

DataFrame  schemaTransactions = sqlContext.createDataFrame(dates, TestClass.class);
schemaTransactions.registerTempTable("dates");
DataFrame dAs = sqlContext.sql("SELECT * FROM dates");
dAs.count();

Maybe this is helpful for someone... 也许这对某人有帮助...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM