我需要做哪些更改才能為 DATABRICKS 更改此 python 代碼

Question

您好，這是我在本地機器上開發的 python 代碼，但現在我正嘗試在 DATABRICKS 上使用此代碼。 但我是 DATABRICKS 的新手，所以不知道我該怎么做。

我想做的是我有一個巨大的 JSON 文件樣本，我將它分成兩部分，一個包含標題，第二個文件包含所有細節。

這是我的本地機器 python 代碼。

import json
import itertools


with open('new_test.json', 'r') as fp:
    data = json.loads(fp.read())


d1 = dict(itertools.islice(data.items(), 8))
print(d1)
d2 = dict(itertools.islice(data.items(), 8, len(data.items())))
print(d2)

with open("new_test_header.json", "w") as header_file:
    json.dump(d1, header_file)
with open("new_test_detail.json", "w") as detail_file:
    json.dump(d2, detail_file)

這是 JSON 文件。

{
  "reporting_entity_name": "launcher",
  "reporting_entity_type": "launcher",
  "plan_name": "launched",
  "plan_id_type": "hios",
  "plan_id": "1111111111",
  "plan_market_type": "individual",
  "last_updated_on": "2020-08-27",
  "version": "1.0.0",
  "in_network": [
    {
      "negotiation_arrangement": "ffs",
      "name": "Boosters",
      "billing_code_type": "CPT",
      "billing_code_type_version": "2020",
      "billing_code": "27447",
      "description": "Boosters On Demand",
      "negotiated_rates": [
        {
          "provider_groups": [
            {
              "npi": [
                0
              ],
              "tin": {
                "type": "ein",
                "value": "11-1111111"
              }
            }
          ],
          "negotiated_prices": [
            {
              "negotiated_type": "negotiated",
              "negotiated_rate": 123.45,
              "expiration_date": "2022-01-01",
              "billing_class": "organizational"
            }
          ]
        }
      ]
    }
  ]
}

這是我想在 DATABRICKS 中寫的內容

import json
import itertools
from pyspark.sql.functions import explode, col

df_json = spark.read.option("multiline","true").json("/mnt/BigData_JSONFiles/SampleDatafilefrombigfile.json")
display(df_json)

d1 = dict(itertools.islice(df_json.items(), 4))
d2 = dict(itertools.islice(df_json.items(), 4, len(df_json.items())))

# I am unable to write the WRITE function.

幫助或指導將非常有幫助。

Answer 1

這是一個片段示例：

from pyspark.sql.functions import explode, col

# Read the JSON file from Databricks storage
df_json = spark.read.json("/mnt/BigData_JSONFiles/new_test.json")

# Convert the dataframe to a dictionary
data = df_json.toPandas().to_dict()

# Split the data into two parts
d1 = dict(itertools.islice(data.items(), 8))
d2 = dict(itertools.islice(data.items(), 8, len(data.items())))

# Convert the first part of the data back to a dataframe
df1 = spark.createDataFrame([d1])

# Write the first part of the data to a JSON file in Databricks storage
df1.write.format("json").save("/mnt/BigData_JSONFiles/new_test_header.json")

# Convert the second part of the data back to a dataframe
df2 = spark.createDataFrame([d2])

# Write the second part of the data to a JSON file in Databricks storage
df2.write.format("json").save("/mnt/BigData_JSONFiles/new_test_detail.json")

我需要做哪些更改才能為 DATABRICKS 更改此 python 代碼

問題描述

1 個解決方案

解決方案1
0 已采納 2023-01-16 09:28:05

我需要做哪些更改才能為 DATABRICKS 更改此 python 代碼

問題描述

1 個解決方案

解決方案1 0 已采納 2023-01-16 09:28:05

解決方案1
0 已采納 2023-01-16 09:28:05