简体   繁体   中英

Properly escaping JSON special characters for use in a CSV file

When retrieving tweets from Twitter, here is a snippet of the raw JSON received (captured via Fiddler:

[{"text":"\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp:\/\/url.com\/6jd5j5"}]

After doing some operations on it, involving deserialization, and then re-serializing it (via JSON.NET ), it ends up in the database like this:

{"text": "\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp://url.com/6jd5j5"}

The only difference, being that the URLs don't have the backslash escapes around the forward slashes. (I'm not sure if this is a big deal, please chime in if it is)

My confusion actually is how to handle these escaped control characters. Running a SELECT query against my table in the MySQL client, Using MySQL's JSON_UNQUOTE function , it will unescape the characters. The \\r\\n are properly escaped, but it keeps the double quotes around the text which is interesting...

+----------+-------------------------------------------------------------------------------------------------------+
| user_id  | JSON_UNQUOTE(JSON_EXTRACT(tw.tweet_json, '$.text'))                                                   |
+----------+-------------------------------------------------------------------------------------------------------+
| 12844052 | "California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother"
http://url.com/6jd5j5 |
+----------+-------------------------------------------------------------------------------------------------------+

Here's what it looks like when I don't use the JSON_UNQUOTE unescape function:

+-------------------------------------------------------------------------------------------------------------+
| JSON_EXTRACT(tw.tweet_json, '$.text')                                                                       |
+-------------------------------------------------------------------------------------------------------------+
| "\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp://url.com/6jd5j5" |
+-------------------------------------------------------------------------------------------------------------+

I need to export these tweets to a CSV file, to be used by Excel or Google Sheets.

I use the following specifier after my query:

INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/so.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';

Opening the CSV file with Excel displays the following: (The second row/entry uses the JSON_UNESCAPE feature)

Notice how the second entry, while using the JSON_UNESCAPE feature, shows excessive slashes.

在此处输入图片说明

Here's the CSV file opened in notepad:

  "\"\\\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\\\"\\r\\nhttp://url.com/6jd5j5\""
"\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"
\
http://url.com/6jd5j5"

Question: How can I properly escape the tweet here, so that it can be read as original intended? Original Tweet Link

Edit The advice to use ESCAPED BY '"' from @Michael - sqlbot has brought me closer - but now when opening the CSV, the second part of the tweet (the URL) is in a new cell. I've verified that this happens in both Excel and Google Sheets:

在此处输入图片说明

Rendered CSV image (copy and pasting the text doesn't work well) 在此处输入图片说明

After some digging around, some helpful comments from @Michael - sqlbot, and this answer , I got it working properly in Google Sheets and Excel with the following statement:

    SELECT REPLACE(JSON_UNQUOTE(JSON_EXTRACT({JSON_COL}, {JSON_PROP_TO_RETRIEVE})), '\r\n', '\n')
      ...
    INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/{FILE_NAME}.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\r\n';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM