简体   繁体   English

如何将pyspark数据框转换为json列表,没有数字的双引号和额外的//日期列

[英]how to convert pyspark data frame to json list without double quotes for numbers and extra // for date columns

I am trying to to convert pyspark data frame to json list which i need to pass the json values to api, when am trying to convert all json values populating with "" like valuue =12 but when converting its coming like value ="12" and date with 3\/7\/2022 how can i avoid these extra \ and double quotes.我正在尝试将 pyspark 数据帧转换为我需要将 json 值传递给 api 的 json 列表,当我尝试转换填充有“”的所有 json 值时,比如 value = 12,但是当转换它的时候就像 value =“12”和日期 3\/7\/2022 我怎样才能避免这些额外的 \ 和双引号。

am using below code:我正在使用以下代码:

loan = loan_detail_df.toPandas()
results = loan.to_json(orient='records')

My output is :我的输出是:

[{"Lead_Number":"","Number":"123456","user1_DOB":"3\/20\/1943""ExpectedRate":"0"}]

My desired output :我想要的输出:

{"Lead_Number":'',"Number":123456,"user1_DOB":"3/20/1943""ExpectedRate":0}

If you use the following syntax:如果您使用以下语法:

df.write.json(<storage path>)

The output will look like this if the date column is in DateType() format:如果date列是DateType()格式,输出将如下所示:

{"number":1,"date":"2021-03-22"}

If the date column is in StringType() format with / as the separator between date components, the output will be:如果date列是StringType()格式,其中/作为日期组件之间的分隔符,则输出将是:

{"number":1,"date":"2022/03/12"}

To get the json string values to write out to api获取 json 字符串值以写入 api
you can use the pyspark.sql.functions.to_json() transformation.您可以使用 pyspark.sql.functions.to_json() 转换。

import pyspark.sql.functions as f
df = spark.createDataFrame([
  (1, '2022/03/12')  
], ['number', 'date'])

df = (
    df
    .withColumn('json_value', f.to_json(f.struct(f.col('number'), f.col('date'))))
)

print(df.collect()[0]['json_value'])

And the output is:输出是:

{"number":1,"date":"2022/03/12"} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM