简体   繁体   English

PySpark:从另一个 Notebook 导入变量时,如何抑制 PySpark 单元中的 %run 输出?

[英]PySpark: How can I suppress %run output in PySpark cell when importing variables from another Notebook?

I am using multiple notebooks in PySpark and import variables across these notebooks using %run path .我在 PySpark 中使用多个笔记本,并使用%run path在这些笔记本中导入变量。 Every time I run the command, all variables that I displayed in the original notebook are being displayed again in the current notebook (the notebook in which I %run).每次运行命令时,我在原始笔记本中显示的所有变量都会再次显示在当前笔记本(我在其中运行的笔记本)中。 But I do not want them to be displayed in the current notebook.但我不希望它们显示在当前笔记本中。 I only want to be able to work with the imported variables.我只想能够使用导入的变量。 How do I suppress the output being display every time?如何抑制每次显示的输出? Note, I am not sure if it matters, but I am working in DataBricks.请注意,我不确定这是否重要,但我在 DataBricks 中工作。 Thank you!谢谢!

Command example:命令示例:

%run /Users/myemail/Nodebook

This is expected behaviour, when you use %run command allows you to include another notebook within a notebook.这是预期的行为,当您使用%run命令时,您可以在笔记本中包含另一个笔记本。 This command lets you concatenate various notebooks that represent key ETL steps, Spark analysis steps, or ad-hoc exploration.此命令可让您连接代表关键 ETL 步骤、Spark 分析步骤或临时探索的各种笔记本。 However, it lacks the ability to build more complex data pipelines.但是,它缺乏构建更复杂数据管道的能力。

在此处输入图片说明

Notebook workflows are a complement to %run because they let you return values from a notebook. Notebook 工作流是%run的补充,因为它们允许您从 Notebook 返回值。 This allows you to easily build complex workflows and pipelines with dependencies.这使您可以轻松构建具有依赖项的复杂工作流和管道。 You can properly parameterize runs ( for example, get a list of files in a directory and pass the names to another notebook—something that's not possible with %run ) and also create if/then/else workflows based on return values.您可以正确地参数化运行(例如,获取目录中的文件列表并将名称传递给另一个笔记本——这是%run 无法实现的),还可以根据返回值创建 if/then/else 工作流。 Notebook workflows allow you to call other notebooks via relative paths.笔记本工作流允许您通过相对路径调用其他笔记本。

You implement notebook workflows with dbutils.notebook methods.您可以使用 dbutils.notebook 方法实现笔记本工作流。 These methods, like all of the dbutils APIs, are available only in Scala and Python.这些方法与所有 dbutils API 一样,仅在 Scala 和 Python 中可用。 However, you can use dbutils.notebook.run to invoke an R notebook.但是,您可以使用 dbutils.notebook.run 来调用 R 笔记本。

For more details, refer " Databricks - Notebook workflows ".有关更多详细信息,请参阅“ Databricks - Notebook 工作流”。

You can use the "Hide Result" option in the upper right toggle of the cell:您可以使用单元格右上角切换中的“隐藏结果”选项: 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我如何比较 PySpark 中另一个 dataframe 的列 - How i can compare columns from another dataframe in PySpark 从任何目录启动的Jupyter Notebook中导入pyspark - Importing pyspark in a Jupyter Notebook launched from any directory 在Python中导入RPy2中的包时,如何禁止输出到控制台? - How can I suppress the output to console when importing packages in RPy2 in Python? 我可以在 Jupyter 笔记本中运行一个单元,而另一个单元正在运行吗? - Can I run a cell in a Jupyter notebook while another cell is running? 我可以将 output 打印到 jupyter 笔记本中的另一个单元格吗? - Can I print output to another cell in a jupyter notebook? 如何在 Jupyter 笔记本 output 中垂直呈现单行 PySpark dataframe? - How do I present a single row of a PySpark dataframe vertically in Jupyter notebook output? 如何捕获pyspark foreachPartition的日志输出? - How can I catch the log output of pyspark foreachPartition? 如何从本地Jupyter笔记本到Docker容器中的Spark master运行PySpark作业? - How to run PySpark jobs from a local Jupyter notebook to a Spark master in a Docker container? 如何在 k8s 上运行 pyspark 作业? - How can i run a pyspark job on k8s? 如何从运行 PySpark 内核的 EMR jupyter notebook 中的另一个 ipynb 文件导入? - How to import from another ipynb file in EMR jupyter notebook which runs a PySpark kernel?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM