简体   繁体   中英

Spark RDD | Fault tolerant

RDD Provides Fault Tolerance Through Lineage Graph, this is how Spark becomes fault tolerant

So while working with spark data frames does spark create RDDs in the background to become fault tolerant?

In general, if I perform any activity/transformation on spark clusters, does spark uses RDD?

RDDs are the back-bone of Spark and the fundamental data structure. Dataframes and datasets are built over RDDs and are meant to provide an abstraction for simplicity.

Think of RDDs as similar to Scala collection but distributed in Nature.

While, Dataframes as an RDD with Schema (in fact, Dataframes are evolved from SchemaRDD ) ie a Two-dimensional collection distributed in nature. Under the hood, Spark does create RDDs.

If you're interested in learning more about it do consider reading these - https://www.databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html

Difference between DataFrame, Dataset, and RDD in Spark

Also, you can see the underlying RDDs in Spark UI (Task DAG)

[ SparkUI1 中的 RDD

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM