简体   繁体   English

使用 Java 类的 Spark Scala 数据集

[英]Spark Scala Datasets using Java Classes

I am creating a Spark application using the Scala binding.我正在使用 Scala 绑定创建一个 Spark 应用程序。 But some of my model's (classes) are written in Java.但是我的一些模型(类)是用 Java 编写的。 When I try to create a Dataset based on Scala Case Class, it works fine and all the columns are visible when I do show() .当我尝试基于 Scala Case Class 创建数据集时,它工作正常,并且当我执行show()时所有列都可见。 But when I create a Dataset based on a Java Class all the columns are packed in a single column named value .但是当我基于 Java 类创建数据集时,所有列都打包在一个名为value列中。

Scala Case Class Example: Scala 案例类示例:

case class Person(name: String, age: Int)

Execution:执行:

sqlContext.createDataset(Seq(Person("abcd", 10))).show()

Output:输出:

name | age

abcd | 10

Java Class Example: Java 类示例:

class Person {
  public String name;
  public int age;
  public Person (String name, int age) {
    this.name = name;
    this.age = age;
  }
}

Execution:执行:

sqlContext.createDataset(Seq(Person("abcd", 10))).show()

Output:输出:

value

[01 00 63 6F 6D 2...]

Are we not suppose to use Java classes as models with Spark Scala app?我们不应该使用 Java 类作为 Spark Scala 应用程序的模型吗? How do we resolve this issue?我们如何解决这个问题?

You can use Java classes to create Datasets but you need to explictly define bean for that class (works like that in java).您可以使用 Java 类来创建数据集,但您需要为该类显式定义 bean(就像在 Java 中那样)。 In addition you need to define getter/setter methods to define bean and your class definition should have public keyword(spark complains about some compliation errors).此外,您需要定义 getter/setter 方法来定义 bean,并且您的类定义应该有 public 关键字(spark 抱怨一些编译错误)。 Hope it works okay for you.希望它对你有用。

Class班级

public class Person {
  private String name;
  private int age;

  public Person (String name, int age) {
    this.name = name;
    this.age = age;
  }

  public String getName() {
    return name;
  }

  public void setName(String name) {
    this.name = name;
  }

  public int getAge() {
    return age;
  }

  public void setAge(int age) {
    this.age = age;
  }
}

Execution执行

implicit val personEncoder = Encoders.bean(classOf[Person])
sql.createDataset(Seq(new Person("abcd", 10))).show()

Result结果

+---+----+
|age|name|
+---+----+
| 10|abcd|
+---+----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM