[英]How to remove extra Escape characters from a text column in spark dataframe
我在 json 中的數據看起來像 -
{"text": "\"I have recently taken out a 12 month mobile phone contract with Virgin but despite two calls to customer help I still am getting a message on my phone indicating \\\"No Service\\\" although intermittently I do get connected.\"", "created_at": "\"2018-08-27 16:58:30\"", "service_id": "51870", "category_id": "249"}
我讀了這個 JSON 使用 -
val complaintsSourceRaw = spark.read.json("file:///complaints.jsonl")
當我讀取 dataframe 中的數據時,它看起來像
|249 |"2018-08-27 16:58:30"|51870 |"I have recently taken out a 12 month mobile phone contract with Virgin but despite two calls to customer help I still am getting a message on my phone indicating **\"No Service\"** although intermittently I do get connected."
問題是
**\"No Service\"** need to be like **"No Service"**
我是如何嘗試的-
complaintsSourceRaw.withColumn("text_cleaned", functions.regexp_replace(complaintsSourceRaw.col("text"), "\", ""));
但是 \ 字符使我的 " 和代碼中斷。知道如何實現這一點嗎?
您需要轉義“\”字符,因此在您的 regexp_replace 中您應該尋找兩個反斜杠 ("\\") 字符,而不是一個。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.