[英]What changes i need to do so that i can Change this python code for DATABRICKS
您好,這是我在本地機器上開發的 python 代碼,但現在我正嘗試在 DATABRICKS 上使用此代碼。 但我是 DATABRICKS 的新手,所以不知道我該怎么做。
我想做的是我有一個巨大的 JSON 文件樣本,我將它分成兩部分,一個包含標題,第二個文件包含所有細節。
這是我的本地機器 python 代碼。
import json
import itertools
with open('new_test.json', 'r') as fp:
data = json.loads(fp.read())
d1 = dict(itertools.islice(data.items(), 8))
print(d1)
d2 = dict(itertools.islice(data.items(), 8, len(data.items())))
print(d2)
with open("new_test_header.json", "w") as header_file:
json.dump(d1, header_file)
with open("new_test_detail.json", "w") as detail_file:
json.dump(d2, detail_file)
這是 JSON 文件。
{
"reporting_entity_name": "launcher",
"reporting_entity_type": "launcher",
"plan_name": "launched",
"plan_id_type": "hios",
"plan_id": "1111111111",
"plan_market_type": "individual",
"last_updated_on": "2020-08-27",
"version": "1.0.0",
"in_network": [
{
"negotiation_arrangement": "ffs",
"name": "Boosters",
"billing_code_type": "CPT",
"billing_code_type_version": "2020",
"billing_code": "27447",
"description": "Boosters On Demand",
"negotiated_rates": [
{
"provider_groups": [
{
"npi": [
0
],
"tin": {
"type": "ein",
"value": "11-1111111"
}
}
],
"negotiated_prices": [
{
"negotiated_type": "negotiated",
"negotiated_rate": 123.45,
"expiration_date": "2022-01-01",
"billing_class": "organizational"
}
]
}
]
}
]
}
這是我想在 DATABRICKS 中寫的內容
import json
import itertools
from pyspark.sql.functions import explode, col
df_json = spark.read.option("multiline","true").json("/mnt/BigData_JSONFiles/SampleDatafilefrombigfile.json")
display(df_json)
d1 = dict(itertools.islice(df_json.items(), 4))
d2 = dict(itertools.islice(df_json.items(), 4, len(df_json.items())))
# I am unable to write the WRITE function.
幫助或指導將非常有幫助。
這是一個片段示例:
from pyspark.sql.functions import explode, col
# Read the JSON file from Databricks storage
df_json = spark.read.json("/mnt/BigData_JSONFiles/new_test.json")
# Convert the dataframe to a dictionary
data = df_json.toPandas().to_dict()
# Split the data into two parts
d1 = dict(itertools.islice(data.items(), 8))
d2 = dict(itertools.islice(data.items(), 8, len(data.items())))
# Convert the first part of the data back to a dataframe
df1 = spark.createDataFrame([d1])
# Write the first part of the data to a JSON file in Databricks storage
df1.write.format("json").save("/mnt/BigData_JSONFiles/new_test_header.json")
# Convert the second part of the data back to a dataframe
df2 = spark.createDataFrame([d2])
# Write the second part of the data to a JSON file in Databricks storage
df2.write.format("json").save("/mnt/BigData_JSONFiles/new_test_detail.json")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.