简体繁体中英

Spark structured streaming: why java code uses DataSet and scala uses DataFrame type?

原文 2020-02-15 15:20:36 3 1 java/ scala/ dataframe/ apache-spark/ dataset

Hi I'm just reading offical document of spark structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example

Its sample code of java uses DataFrame and introduces DataFrame as the basic type for structured streaming, but meanwhile the sibling java code uses Dataset as data type. I'm just wondering, as long as spark/scala is running on jvm, shouldn't both scala and java use the same data type to represent?

Or DataFrame is in fact a kind of Dataset, somehow?

Hope to get your explanations of this question, thank you.

1 answers

A DataFrame is, indeed, a special case of DataSet - DataSet[Row] , where Row is a mostly generic type defined here . DataFrame as a term and a type predates the DataSet API, and its usage as an alias for DataSet[Row] is more or less a compatibility feature in Scala Spark. A full explanation of the differences, such as they are, is availabe here .

Applying collect() to a Apache Spark structured streaming Dataset

Spark Streaming application uses all the workers

Spark uses Scala version 2.11.8 but installed 2.11.7

Spark Structured Streaming Programming with Kafka in Java

Spark Structured Streaming Unit Test in Java

Uses for the Java Void Reference Type?

Why scala Future.never uses CountDownLatch?

Spark Dataframe to Dataset of Java class

How do I build a mixed java/scala project which uses java annotation code generation?

How to consume correctly from Kafka topic with Java Spark structured streaming

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Applying collect() to a Apache Spark structured streaming Dataset Spark Streaming application uses all the workers Spark uses Scala version 2.11.8 but installed 2.11.7 Spark Structured Streaming Programming with Kafka in Java Spark Structured Streaming Unit Test in Java Uses for the Java Void Reference Type? Why scala Future.never uses CountDownLatch? Spark Dataframe to Dataset of Java class How do I build a mixed java/scala project which uses java annotation code generation? How to consume correctly from Kafka topic with Java Spark structured streaming

Related Tags

Spark structured streaming: why java code uses DataSet and scala uses DataFrame type?

Question

1 answers

solution1 2 ACCPTED 2020-02-15 17:03:26

solution1
2 ACCPTED 2020-02-15 17:03:26