[英]How to read a JSON file in Azure Databricks from Azure Data Lake Store
I am using Azure Data Lake Store for storing simple JSON files with the following JSON: 我正在使用Azure Data Lake Store使用以下JSON存储简单的JSON文件:
{
"email": "Usersemail@domain.com",
"id": "823956724385"
}
The json files name is myJson1.json
. json文件名为
myJson1.json
。 The Azure Data Lake Store is mounted successfully to Azure Databricks. Azure Data Lake Store已成功安装到Azure Databricks。
I am able to load successfully the JSON file via 我能够通过成功加载JSON文件
df = spark.read.option("multiline", "true").json(fi.path)
fi.path
is a FileInfo
Object which is the MyJson1.json
file from above. fi.path
是一个FileInfo
对象,它是上面的MyJson1.json
文件。
When i do 当我做
spark.read.option("multiline", "true").json(fi.path)
df.show()`
i get the JSON object printed out correctly (DataFrame) as 我将JSON对象正确打印出来(DataFrame)为
+---------------------+------------+
| email| id|
+---------------------+------------+
|Usersemail@domain.com|823956724385|
+---------------------+------------+
What i want to do is, to load the JSON file with json.load(filename)
, to be able to work with the JSON object within Python. 我想要做的是,使用
json.load(filename)
加载JSON文件,以便能够在Python中使用JSON对象。
When i do 当我做
with open('adl://.../myJson1.json', 'r') as file:
jsonObject0 = json.load(file)
then i get the following error 然后我得到以下错误
[Errno 2] No such file or directory 'adl://.../myJson1.json'
[Errno 2]没有这样的文件或目录'adl://.../myJson1.json'
When i try (the mount point is correct, i can list the file and also with spark.read into a DataFrame) 当我尝试时(挂载点正确,我可以列出文件,也可以通过spark.read进入DataFrame)
jsonObject = json.load("/mnt/adls/data/myJson1.json")
then i get the following error 然后我得到以下错误
'str' object has no attribute 'read'
'str'对象没有属性'read'
I have no idea what to do else to get the JSON loaded. 我不知道该怎么办才能加载JSON。 My goal is to read the JSON object and iterate through the keys and their values.
我的目标是读取JSON对象并遍历键及其值。
The trick was to use the following syntax for the file url 诀窍是对文件url使用以下语法
/dbfs/mnt/adls/data/myJson1.json
i had to add /dbfs/...
respectively replace dbfs:/
with /dbfs/
at the beginning of the url. 我不得不添加
/dbfs/...
分别代替dbfs:/
与/dbfs/
在URL的开头。
Then i could use 那我可以用
with open('/dbfs/mnt/adls/ingress/marketo/update/leads/leads-json1.json', 'r') as f:
data = f.read()
jsonObject = json.loads(data)
Maybe it possible easier? 也许更容易? But this works for now.
但这暂时有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.