简体   繁体   中英

Multiple cells in databricks notebook

I am new to databricks. Question is why there are multiple cells in notebook, when we can write whole set of instructions/program in 1 single cell?

Regards,

The advantage of using Multiple cells is you can break your big code in small portions (in each cell) and can execute that cell individually without the need to execute the complete code which may take long time because of Big Analysis, Large Datasets, Exploratory Data Analysis, Transformation, etc.

In other words, we can say that since Databricks is a Big Data Analysis Tool which involves Large Dataset (millions of rows) ingestion, cleaning of dataset, transformation and then implementing data analysis and machine learning algorithms. All these tasks require large compute resources if you run in single cell. Therefore, you can divide the above mentioned tasks in each cell in Databricks Notebook and run them individually.

Eg: If you are ingesting data from Azure Data Lake Storage account (ADLS), you can create a mount point to the required storage resource and path in a cell and run this cell individually. Now your ADLS container is mounted you can use another cell to prepare the data. In this way, you don't need to mount the resource again as it is already done in previous cell.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM