将 json 保存到 Azure Data Lake Storage Gen 2 中的文件

Question

In Databricks, using Python, I am making a get request, using the requests library and the response is a json.在 Databricks 中，使用 Python，我正在使用 requests 库发出获取请求，响应是 json。

Here is an example of the get request:以下是获取请求的示例：

json_data = requests.get("https://prod-noblehire-api-000001.appspot.com/job?").json()

I would like to save the json_data variable as a file in Azure Data Lake Storage.我想将 json_data 变量保存为 Azure Data Lake Storage 中的文件。 I don't want to read it into Pyspark/Pandas DataFrame first and then save.我不想先将其读入 Pyspark/Pandas DataFrame 然后保存。

If I was saving it to a local folder on my computer, I would have used the following code:如果我将它保存到计算机上的本地文件夹中，我会使用以下代码：

j = json.dumps(json_data)
with open("MyJsonFile.json", "w") as f:
    f.write(j)
    f.close()

However, since I would like to save it in Azure Data Lake Storage, I should be using the following, according to Microsoft's documentation:但是，由于我想将其保存在 Azure 数据湖存储中，因此根据 Microsoft 的文档，我应该使用以下内容：

def upload_file_to_directory():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.create_file("uploaded-file.txt")
        local_file = open("C:\\file-to-upload.txt",'r')

        file_contents = local_file.read()

        file_client.append_data(data=file_contents, offset=0, length=len(file_contents))

        file_client.flush_data(len(file_contents))

    except Exception as e:
      print(e)

How can I combine both pieces of code to save the variable as a file in ADLS?如何结合这两段代码将变量保存为 ADLS 中的文件？ Also, is there a better way to do that?另外，有没有更好的方法来做到这一点？

Answer 1

You don't really have to save locally.您实际上不必在本地保存。 Rather you can mount your ADLS storage account and then write the desired JSON content to it.相反，您可以挂载您的 ADLS 存储帐户，然后将所需的 JSON 内容写入其中。 Below is the code that worked for me.下面是对我有用的代码。

import requests
import json

json_data = requests.get("<YOUR_URL>").json()
j = json.dumps(json_data)
with open("/<YOUR_MOUNT_POINT>/<FILE_NAME>.json", "w") as f:
    f.write(j)
    f.close()

在此处输入图像描述

将 json 保存到 Azure Data Lake Storage Gen 2 中的文件

问题描述

1 个解决方案

解决方案1
0 2022-09-15 08:08:29

将 json 保存到 Azure Data Lake Storage Gen 2 中的文件

问题描述

1 个解决方案

解决方案1 0 2022-09-15 08:08:29

解决方案1
0 2022-09-15 08:08:29