简体   繁体   English

如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件

[英]How to read a JSON file in Azure Databricks from Azure Data Lake Store

I am using Azure Data Lake Store for storing simple JSON files with the following JSON: 我正在使用Azure Data Lake Store使用以下JSON存储简单的JSON文件:

{
  "email": "Usersemail@domain.com",
  "id": "823956724385"
}

The json files name is myJson1.json . json文件名为myJson1.json The Azure Data Lake Store is mounted successfully to Azure Databricks. Azure Data Lake Store已成功安装到Azure Databricks。

I am able to load successfully the JSON file via 我能够通过成功加载JSON文件

df = spark.read.option("multiline", "true").json(fi.path)

fi.path is a FileInfo Object which is the MyJson1.json file from above. fi.path是一个FileInfo对象,它是上面的MyJson1.json文件。

When i do 当我做

spark.read.option("multiline", "true").json(fi.path)
df.show()` 

i get the JSON object printed out correctly (DataFrame) as 我将JSON对象正确打印出来(DataFrame)为

+---------------------+------------+
|                email|          id|
+---------------------+------------+
|Usersemail@domain.com|823956724385|
+---------------------+------------+

What i want to do is, to load the JSON file with json.load(filename) , to be able to work with the JSON object within Python. 我想要做的是,使用json.load(filename)加载JSON文件,以便能够在Python中使用JSON对象。

When i do 当我做

with open('adl://.../myJson1.json', 'r') as file:
  jsonObject0 = json.load(file)

then i get the following error 然后我得到以下错误

[Errno 2] No such file or directory 'adl://.../myJson1.json' [Errno 2]没有这样的文件或目录'adl://.../myJson1.json'

When i try (the mount point is correct, i can list the file and also with spark.read into a DataFrame) 当我尝试时(挂载点正确,我可以列出文件,也可以通过spark.read进入DataFrame)

    jsonObject = json.load("/mnt/adls/data/myJson1.json")

then i get the following error 然后我得到以下错误

'str' object has no attribute 'read' 'str'对象没有属性'read'

I have no idea what to do else to get the JSON loaded. 我不知道该怎么办才能加载JSON。 My goal is to read the JSON object and iterate through the keys and their values. 我的目标是读取JSON对象并遍历键及其值。

The trick was to use the following syntax for the file url 诀窍是对文件url使用以下语法

/dbfs/mnt/adls/data/myJson1.json

i had to add /dbfs/... respectively replace dbfs:/ with /dbfs/ at the beginning of the url. 我不得不添加/dbfs/...分别代替dbfs://dbfs/在URL的开头。

Then i could use 那我可以用

    with open('/dbfs/mnt/adls/ingress/marketo/update/leads/leads-json1.json', 'r') as f:
      data = f.read()

    jsonObject = json.loads(data)

Maybe it possible easier? 也许更容易? But this works for now. 但这暂时有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从数据湖重命名 Azure Databricks 中的文件时出现问题 - Problem when rename file in Azure Databricks from a data lake 如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store - How to Export Results of a SQL Query from Databricks to Azure Data Lake Store 从 Azure Data Lake Storage Gen 2 读取 CSV 到 Pandas Dataframe | 没有数据块 - Read CSV from Azure Data Lake Storage Gen 2 to Pandas Dataframe | NO DATABRICKS 解析从Azure数据湖下载的json文件 - Parse json file downloaded from Azure data lake 如何读取 Azure Databricks 中的镶木地板文件? - How to read a parquet file in Azure Databricks? 将 json 保存到 Azure Data Lake Storage Gen 2 中的文件 - Save json to a file in Azure Data Lake Storage Gen 2 使用 Databricks 在 Apache Spark 中安装 Azure 数据湖时出错 - Error Mounting Azure Data Lake in Apache Spark using Databricks 无法使用 python 从 Azure 数据湖中加载 tensorflow keras model - Trouble loading tensorflow keras model in databricks from Azure data lake using python 用于访问 Azure Data Lake Store 的 Python 代码 - Python code to access Azure Data Lake Store 如何在 databricks/Azure 数据湖中保存 15k csv 文件 - How to save 15k csv files in databricks/ Azure data lake
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM