简体   繁体   English

Azure Databricks Jupyter Notebook Python 和 R 在一个单元格中

[英]Azure Databricks Jupyter Notebook Python and R in one cell

I have some code (mostly not my original code), that I have running on my local PC Anaconda Jupyter Notebook environment.我有一些代码(大部分不是我的原始代码),我在本地 PC Anaconda Jupyter Notebook 环境上运行。 I need to scale up the processing so I am looking into Azure Databricks.我需要扩大处理范围,所以我正在研究 Azure Databricks。 There's one section of code that's running a Python loop but utilizes an R library (stats), then passes the data through an R model (tbats).有一段代码运行 Python 循环但使用 R 库 (stats),然后通过 R model (tbats) 传递数据。 So one Jupyter Notebook cell runs python and R code.因此,一个 Jupyter Notebook 单元运行 python 和 R 代码。 Can this be done in Azure Databricks Notebooks as well?这也可以在 Azure Databricks 笔记本中完成吗? I only found documentation that lets you change languages from cell to cell.我只找到了可以让您在不同单元格之间更改语言的文档。

In a previous cell I have:在之前的单元格中,我有:

%r libarary(stats) 

So the library stats is imported (along with other R libraries).所以库统计信息被导入(连同其他 R 库)。 However when I run the code below, I get但是,当我运行下面的代码时,我得到

NameError: name 'stats' is not defined NameError: 名称 'stats' 未定义

I am wondering if it's the way Databricks wants you to tell the cell the language you're using (eg %r, %python, etc.).我想知道这是否是 Databricks 希望您告诉单元格您正在使用的语言(例如 %r、%python 等)的方式。

My Python code:我的 Python 代码:

for customerid, dataForCustomer in original.groupby(by=['customer_id']):
    startYear = dataForCustomer.head(1).iloc[0].yr
    startMonth = dataForCustomer.head(1).iloc[0].mnth
    endYear = dataForCustomer.tail(1).iloc[0].yr
    endMonth = dataForCustomer.tail(1).iloc[0].mnth

    #Creating a time series object
    customerTS = stats.ts(dataForCustomer.usage.astype(int),
                      start=base.c(startYear,startMonth),
                      end=base.c(endYear, endMonth), 
                      frequency=12)
    r.assign('customerTS', customerTS)

    ##Here comes the R code piece
    try:
        seasonal = r('''
                    fit<-tbats(customerTS, seasonal.periods = 12, 
                                    use.parallel = TRUE)
                    fit$seasonal
                 ''')
    except: 
        seasonal = 1

    # APPEND DICTIONARY TO LIST (NOT DATA FRAME)
    df_list.append({'customer_id': customerid, 'seasonal': seasonal})
    print(f' {customerid} | {seasonal} ')

seasonal_output = pa.DataFrame(df_list)

If you change languages in databricks you will not be able to get the variables of the previous language如果您更改数据块中的语言,您将无法获得以前语言的变量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure AutoML Python SDK,导入 VS Code Jupyter Notebook 时出现问题 - Azure AutoML Python SDK, Issues importing into VS Code Jupyter Notebook 通过 Azure Databricks Notebook 传递参数 URL - Passing parameter via Azure Databricks Notebook URL 在 Azure 中自动执行 Jupyter notebook - Automating the execution of a Jupyter notebook in Azure “gaierror:[Errno -5] 没有与主机名关联的地址”,同时从 Azure 数据块 python 笔记本发送 email - "gaierror: [Errno -5] No address associated with hostname" while sending email from Azure databricks python notebook Azure 与数据块笔记本通信时出现功能错误 - Azure functions error communicating with a databricks notebook 在 Azure Databricks 上并行化 Python 代码 - Parallelizing Python code on Azure Databricks 我如何从 Azure Devops 自动化 Databricks 笔记本 - How do i automate Databricks notebook from Azure Devops Azure Databricks API,如何通过 API 将集群附加到上传的笔记本 - Azure Databricks API, how to attach a cluster to an uploaded notebook via API 如何在Azure Devops中做Databricks Notebook的CICD? - How can I do CICD of Databricks Notebook in Azure Devops? 将本地 Jupyter Hub 连接到 Azure Databricks Spark 集群 - Connect local Jupyter Hub to Azure Databricks Spark Cluster
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM