Azure Databricks 筆記本管道中的 Py4JJavaError

Question

我有一個奇怪的問題，當通過dbutils.notebook.run從調用者筆記本啟動 databricks 筆記本時（我在 Azure Databricks 工作）。

我注意到的一件有趣的事情是，當手動啟動內部筆記本時，一切都很順利。

~~我也很肯定，即使在完全相同的條件下被外部筆記本調用時，至少有一次運行是成功的。~~ 從外部調用時它可能從未起作用，請參閱下面的問題說明。

奇怪的是，當我查看內部筆記本運行時，我有一個 pandas 相關異常（KeyError KeyError: "None of [Index(['address'], dtype='object')] are in the [columns]" ）。 但我真的不認為它與我的代碼有關，就像上面提到的，代碼在內部筆記本直接運行時有效。 對於它的幫助，內部筆記本有一些繁重的 pandas 計算。

外部筆記本中完整可見的 java 堆棧是：

Py4JJavaError: An error occurred while calling o1141._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
    at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:71)
    at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:122)
    at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:89)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: com.databricks.NotebookExecutionException: FAILED
    at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:117)
    at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:66)
    ... 13 more

歡迎任何幫助，謝謝！

Answer 1

感謝@AlexOtt ，我確定了問題的根源。

我想分享的主要內容是仔細檢查筆記本之間傳遞的作業參數（尤其是使用標准傳遞參數方式發生的“類型轉換”）

在我的具體情況下，我想將 integer 傳遞給內部筆記本，但在此過程中它被轉換為字符串，之后被錯誤地考慮在內。

在外部筆記本中：

# set up the parameter dict
jobs_params = {
    ...
    'max_accounts': 0,  # set to 0 to parse all the accounts sent    
}

# call the inner notebook
dbutils.notebook.run("./01_JSON_Processing", 1800, jobs_params)

在內部筆記本中：

arg_list = [
    ...
    'max_accounts',
]

v = dict()

for arg in arg_list:
    dbutils.widgets.text(arg, "", "")
    v[arg] = dbutils.widgets.get(arg)

檢查v['max_accounts']的類型表明它已在此過程中轉換為字符串（進一步計算導致KeyError異常）。

在調試內部筆記本時，我沒有發現問題，我只是在內部筆記本中復制/粘貼了job_params值，但這並沒有在此過程max_accounts為字符串。

Azure Databricks 筆記本管道中的 Py4JJavaError

問題描述

1 個解決方案

解決方案1
0 已采納 2021-12-19 09:47:41

Azure Databricks 筆記本管道中的 Py4JJavaError

問題描述

1 個解決方案

解決方案1 0 已采納 2021-12-19 09:47:41

解決方案1
0 已采納 2021-12-19 09:47:41