[英]Creating Pyspark dataframe on a python dictonary with special character
我有一個 python 字典如下:
data = [{"cust_decision": "buy", "cust_details": "Easy to use"}, {"cust_decision": "buy", "cust_details": "econoimical"}, {"cust_decision":"no buy", "cust_details": "Didn’t like Product"}]
我正在為這些數據創建一個 pyspark df 和 temp 視圖,如下所示:
from pyspark.sql import SparkSession, Row
spark.createDataFrame([Row(**i) for i in data]).createOrReplaceTempView("cust")
現在,當我看到這個臨時視圖的數據時,特殊字符' (這不是單引號 'it's')被更改為不同的字符â 。 下面是結果
spark.table("cust").show(10,False)
+-------------+---------------------+
|cust_decision|cust_details |
+-------------+---------------------+
|buy |Easy to use |
|buy |econoimical |
|no buy |Didn’t like Product|
+-------------+---------------------+
但我想在每個值中都得到這個角色。 我怎樣才能實現它? 以下是預期結果:
+-------------+---------------------+
|cust_decision|cust_details |
+-------------+---------------------+
|buy |Easy to use |
|buy |econoimical |
|no buy |Didn’t like Product |
+-------------+---------------------+
謝謝..
嘗試將您的數據字典decoding
為utf-8
data = [{"cust_decision": "buy", "cust_details": "Easy to use"}, {"cust_decision": "buy", "cust_details": "econoimical"}, {"cust_decision":"no buy", "cust_details": "Didn’t like Product"}]
decode_data=[{k: v.decode("utf-8") for k,v in i.items() } for i in data]
from pyspark.sql import SparkSession, Row
spark.createDataFrame([Row(**i) for i in decode_data]).createOrReplaceTempView("cust")
spark.table("cust").show(10,False)
#+-------------+-------------------+
#|cust_decision|cust_details |
#+-------------+-------------------+
#|buy |Easy to use |
#|buy |econoimical |
#|no buy |Didn’t like Product|
#+-------------+-------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.