简体   繁体   中英

Spark structured streaming: why java code uses DataSet and scala uses DataFrame type?

Hi I'm just reading offical document of spark structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example

Its sample code of java uses DataFrame and introduces DataFrame as the basic type for structured streaming, but meanwhile the sibling java code uses Dataset as data type. I'm just wondering, as long as spark/scala is running on jvm, shouldn't both scala and java use the same data type to represent?

Or DataFrame is in fact a kind of Dataset, somehow?

Hope to get your explanations of this question, thank you.

A DataFrame is, indeed, a special case of DataSet - DataSet[Row] , where Row is a mostly generic type defined here . DataFrame as a term and a type predates the DataSet API, and its usage as an alias for DataSet[Row] is more or less a compatibility feature in Scala Spark. A full explanation of the differences, such as they are, is availabe here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM