简体   繁体   English

将 json 保存到 Azure Data Lake Storage Gen 2 中的文件

[英]Save json to a file in Azure Data Lake Storage Gen 2

In Databricks, using Python, I am making a get request, using the requests library and the response is a json.在 Databricks 中,使用 Python,我正在使用 requests 库发出获取请求,响应是 json。

Here is an example of the get request:以下是获取请求的示例:

json_data = requests.get("https://prod-noblehire-api-000001.appspot.com/job?").json()

I would like to save the json_data variable as a file in Azure Data Lake Storage.我想将 json_data 变量保存为 Azure Data Lake Storage 中的文件。 I don't want to read it into Pyspark/Pandas DataFrame first and then save.我不想先将其读入 Pyspark/Pandas DataFrame 然后保存。

If I was saving it to a local folder on my computer, I would have used the following code:如果我将它保存到计算机上的本地文件夹中,我会使用以下代码:

j = json.dumps(json_data)
with open("MyJsonFile.json", "w") as f:
    f.write(j)
    f.close()

However, since I would like to save it in Azure Data Lake Storage, I should be using the following, according to Microsoft's documentation:但是,由于我想将其保存在 Azure 数据湖存储中,因此根据 Microsoft 的文档,我应该使用以下内容:

def upload_file_to_directory():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.create_file("uploaded-file.txt")
        local_file = open("C:\\file-to-upload.txt",'r')

        file_contents = local_file.read()

        file_client.append_data(data=file_contents, offset=0, length=len(file_contents))

        file_client.flush_data(len(file_contents))

    except Exception as e:
      print(e)

How can I combine both pieces of code to save the variable as a file in ADLS?如何结合这两段代码将变量保存为 ADLS 中的文件? Also, is there a better way to do that?另外,有没有更好的方法来做到这一点?

You don't really have to save locally.您实际上不必在本地保存。 Rather you can mount your ADLS storage account and then write the desired JSON content to it.相反,您可以挂载您的 ADLS 存储帐户,然后将所需的 JSON 内容写入其中。 Below is the code that worked for me.下面是对我有用的代码。

import requests
import json

json_data = requests.get("<YOUR_URL>").json()
j = json.dumps(json_data)
with open("/<YOUR_MOUNT_POINT>/<FILE_NAME>.json", "w") as f:
    f.write(j)
    f.close()

在此处输入图像描述

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 .parquet 文件从本地计算机上传到 Azure Storage Data Lake Gen2? - How can I upload a .parquet file from my local machine to Azure Storage Data Lake Gen2? Azure Data Lake Storage Gen2 (ADLS Gen2) 作为 Kedro 管道的数据源 - Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline 上传到 Azure Data Lake gen 2 后 Parquet 文件不可读(Python) - Parquet file after upload to Azure Data Lake gen 2 not readable (Python) 从 Azure Data Lake Storage Gen 2 读取 CSV 到 Pandas Dataframe | 没有数据块 - Read CSV from Azure Data Lake Storage Gen 2 to Pandas Dataframe | NO DATABRICKS 将 pandas 数据帧导出到 Azure 数据湖存储作为 CSV 文件? - Export pandas data frame to Azure Data Lake Storage as a CSV file? 使用 python 将文件从 azure 数据湖 Gen 1 移动到临时目录 - move file from azure data lake Gen 1 to a temp directory using python 在 Azure Data Lake Storage Gen1 中将 Spark Dataframe 保存为 Delta Table 时,有没有办法在写入之前判断将创建多少个文件? - Is there a way to tell before the write how many files will be created when saving Spark Dataframe as Delta Table in Azure Data Lake Storage Gen1? 使用REST API批量上传到Azure Data Lake Gen 2 - Bulk upload to Azure Data Lake Gen 2 with REST APIs 解析从Azure数据湖下载的json文件 - Parse json file downloaded from Azure data lake 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM