简体   繁体   中英

How to remove extra Escape characters from a text column in spark dataframe

My Data in a json looks like -

{"text": "\"I have recently taken out a 12 month mobile phone contract with Virgin but despite two calls to customer help I still am getting a message on my phone indicating \\\"No Service\\\" although intermittently I do get connected.\"", "created_at": "\"2018-08-27 16:58:30\"", "service_id": "51870", "category_id": "249"}

I read this JSON Using -

val complaintsSourceRaw = spark.read.json("file:///complaints.jsonl")

When i read the data in dataframe, it looks like

|249        |"2018-08-27 16:58:30"|51870     |"I have recently taken out a 12 month mobile phone contract with Virgin but despite two calls to customer help I still am getting a message on my phone indicating **\"No Service\"** although intermittently I do get connected."  

Issue is

 **\"No Service\"**  need to be like  **"No Service"** 
             

How i am trying -

complaintsSourceRaw.withColumn("text_cleaned", functions.regexp_replace(complaintsSourceRaw.col("text"), "\", ""));

However \ character excapes my " and code breaks. Any idea how to acieve this?

You need to escape the "\" character, so in your regexp_replace you should look for two backslash ("\\") characters, not one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM