Hi I'm just reading offical document of spark structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example
Its sample code of java uses DataFrame and introduces DataFrame as the basic type for structured streaming, but meanwhile the sibling java code uses Dataset as data type. I'm just wondering, as long as spark/scala is running on jvm, shouldn't both scala and java use the same data type to represent?
Or DataFrame is in fact a kind of Dataset, somehow?
Hope to get your explanations of this question, thank you.
A DataFrame
is, indeed, a special case of DataSet
- DataSet[Row]
, where Row
is a mostly generic type defined here . DataFrame
as a term and a type predates the DataSet
API, and its usage as an alias for DataSet[Row]
is more or less a compatibility feature in Scala Spark. A full explanation of the differences, such as they are, is availabe here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.