简体   繁体   English

如何在没有笔记本的情况下从 Azure Databricks Spark 群集查询数据?

[英]How to query data from Azure Databricks Spark cluster without notebooks?

I have a running Spark 2.3.1 cluster hosted in https:azuredatabricks.net, I have created a database with a permanent table and I have been able to run queries through the Notebook web interface.我在 https:azuredatabricks.net 中托管了一个正在运行的 Spark 2.3.1 集群,我创建了一个带有永久表的数据库,并且能够通过 Notebook Web 界面运行查询。
Now I am looking for a way to query the same cluster from a .Net console application and I am lost.现在我正在寻找一种从 .Net 控制台应用程序查询同一个集群的方法,但我迷路了。

1. Is there Rest API that can be used to perform SQL/Python queries? 1. 是否有可用于执行 SQL/Python 查询的 Rest API?
2. How to configure ODBC connection string to connect to the cluster and what are the working ODBC drivers out there? 2. 如何配置 ODBC 连接字符串以连接到集群以及有哪些可用的 ODBC 驱动程序?

Eventually I am looking for a way to enable users to run one of several predefined parametrized queries against the Spark cluster through a Web App/REST service written using JavaScript or .Net code.最终,我正在寻找一种方法,使用户能够通过使用 JavaScript 或 .Net 代码编写的 Web App/REST 服务对 Spark 集群运行多个预定义的参数化查询之一。

To the best of my knowledge, there is not currently a way to query Databricks tables outside of the Databricks workspace.据我所知,目前没有办法在 Databricks 工作区之外查询 Databricks 表。

Depending on what you are attempting to accomplish, you could leverage the REST API to execute a job (Notebook or JAR) that executes your parameterized queries.根据您尝试完成的任务,您可以利用 REST API 来执行执行参数化查询的作业(笔记本或 JAR)。 This is described in the Databricks REST API documentation ( https://docs.azuredatabricks.net/api/latest/jobs.html#run-now ).这在 Databricks REST API 文档 ( https://docs.azuredatabricks.net/api/latest/jobs.html#run-now ) 中有描述。 If you need the results of the queries in your .NET application, your options are going to be limited, and your best bet is probably to write the results of the query to a file in Data Lake Storage or Blob Storage, and then read from there with your console application.如果您需要 .NET 应用程序中的查询结果,您的选择将会受到限制,您最好的选择可能是将查询结果写入 Data Lake Storage 或 Blob Storage 中的文件,然后从在那里与您的控制台应用程序。 You could pass the name of the file in as a parameter from the console application, so you can easily retrieve it after execution completes.您可以将文件的名称作为来自控制台应用程序的参数传入,以便在执行完成后轻松检索它。

To connect to the cluster from .NET, you would need to use a Databricks Access Token and the Authentication REST API (https://docs.azuredatabricks.net/api/latest/authentication.html ).要从 .NET 连接到集群,您需要使用 Databricks 访问令牌和身份验证 REST API (https://docs.azuredatabricks.net/api/latest/authentication.html )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 .Net UI 在 Databricks 上运行 Spark SQL 查询 - Run a Spark SQL query on Databricks from .Net UI 有没有办法通过 .NET 查询 Databricks DBFS 或镶木地板以获得 Apache Spark? - Is there a way to query Databricks DBFS or parquets via .NET for Apache Spark? 如何在 Azure Synapse C# 笔记本中安装包? - How to install packages in Azure Synapse C# notebooks? 如何使用.net从Labview生成的文件中读取群集数据 - How to read cluster data from a labview generated file using .net Azure表存储查询从错误的分区返回数据? - Azure table storage query returning data from wrong partition? 如何仅从导航属性中查询一小部分(不包括该属性中的所有数据)? - How to query only a small part from a navigation property (without including all data in that property)? LinqToSql…查询无数据 - LinqToSql …query without no data 不使用foreach从多个查询中获取数据 - Getting data from multiple query without using foreach 使用 Azure.Data.Tables 中的 TableClient.Query 限制查询结果 - Limiting query results using TableClient.Query from Azure.Data.Tables 如何从 Azure 表存储中查询最近的 n 条记录? - How to query the most recent n records from Azure Table Storage?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM