简体   繁体   中英

How to remove white spaces and \n in the JSON file in Python

I have JSON data coming to S3 in the format:

{\n  \"data\": {\n    \"event_type\": \"message.received\",\n    \"id\": \"819\",\n    \"occurred_at\": \"2020-10\",\n    \"payload\": {\n      \"cc\": [],\n      \"completed_at\": null,\n      \"cost\": null,\n      \"direction\": \"inbound\",\n      \"encoding\": \"GSM-7\",\n      \"errors\": [],\n      \"from\": {\n        \"carrier\": \"Verizon\",\n        \"line_type\": \"Wireless\",\n        \"phone_number\": \"+111111111\"\n      },\n      \"id\": \"e8e0d1e3-dce3-\",\n      \"media\": [],\n      \"messaging_profile_id\": \"400176\",\n      \"organization_id\": \"717d556f-ba4f-\",\n      \"parts\": 1,\n      \"received_at\": \"2020-1\",\n      \"record_type\": \"message\",\n      \"sent_at\": null,\n      \"tags\": [],\n      \"text\": \"Hi \",\n      \"to\": [\n        {\n          \"carrier\": \"carr\",\n          \"line_type\": \"Wireless\",\n          \"phone_number\": \"+111111111\",\n}\n}"

I want it to be converted like this:

{
  "data": {
    "event_type": "message.received",
    "id": "76a60230",
    "occurred_at": "2020-12-1",
    "payload": {
      "cc": [],
      "completed_at": null,
      "cost": null,
      "direction": "inbound",
      "encoding": "GSM-7",
      "errors": [],
      "from": {
        "carrier": "Verizon",
        "line_type": "Wireless",
        "phone_number": "+1111111111"
      },
      "id": "06c9c765",
      "media": [],
      "messaging_profile_id": "40017",
      "organization_id": "717d5",
      "parts": 1,
      "received_at": "2020-1",
      "record_type": "message",
      "sent_at": null,
      "tags": [],
      "text": "Hi",
      "to": [
        {
          "carrier": "abc",
          "line_type": "Wireless",
          "phone_number": "+1111111111",
          "status": "delivered"
        }
      ],
      "type": "SMS",
      "valid_until": null,
      "failover_url": null,
      "url": "https://639hpj"
    },
    "record_type": "event"
  },
  "meta": {
    "attempt": 1,
    "delivered_to": "https://639hpj"
  }
}

The first JSON data I kept came in lines and not in the Struct format. I did not keep the actual JSON data but it was in that similar format (but valid). I would like to run a lambda function in which the JSON data is free from \n and white spaces.

The above 2 JSON data are not the same but I will be receiving the first type of JSON data and I would like to convert it to the second type which is free of white spaces and \n .

Did you realize that spaces and newlines are what print uses for formatting?

Lets us call t your first json (I fixed it by adding the missing brackets at the end):

t = '''{\n "data": {\n "event_type": "message.received",\n "id": "819",\n "occurred_at": "2020-10",\n "payload": {\n "cc": [],\n "completed_at": null,\n "cost": null,\n "direction": "inbound",\n "encoding": "GSM-7",\n "errors": [],\n "from": {\n "carrier": "Verizon",\n "line_type": "Wireless",\n "phone_number": "+111111111"\n },\n "id": "e8e0d1e3-dce3-",\n "media": [],\n "messaging_profile_id": "400176",\n "organization_id": "717d556f-ba4f-",\n "parts": 1,\n "received_at": "2020-1",\n "record_type": "message",\n "sent_at": null,\n "tags": [],\n "text": "Hi ",\n "to": [\n {\n "carrier": "carr",\n "line_type": "Wireless",\n "phone_number": "+111111111"\n}\n]\n}}}'''

It prints as:

>>> print(t)
{
 "data": {
 "event_type": "message.received",
 "id": "819",
 "occurred_at": "2020-10",
 "payload": {
 "cc": [],
 "completed_at": null,
 "cost": null,
 "direction": "inbound",
 "encoding": "GSM-7",
 "errors": [],
...

To obtain the expected representation you should:

  1. load it into a Python object: js = json.loads(t)

  2. dump it back into a string with 2 as indentation: t2 = json.dumps(js)

    t2 actually looks like '{\n "data": {\n "event_type": "message.received",\n "id": "819",\n "occurred_at": "2020-10",\n "payload": {\n "cc": [],\n "completed_at": null,\n "cost": null,\n...

  3. print it:

     >>> print(t2) { "data": { "event_type": "message.received", "id": "819", "occurred_at": "2020-10", "payload": { "cc": [], "completed_at": null, "cost": null, "direction": "inbound", "encoding": "GSM-7", "errors": [], "from": { "carrier": "Verizon", "line_type": "Wireless", ...

A one liner could be:

print(json.dumps(json.loads(t), indent=2))

You can first load it as a Python dictionary with:

import json

myDict = json.loads(jsonString)

And then, convert it back to a minimized/indented JSON string:

minimizedJSON = json.dumps(myDict)
indentedJSON = json.dumps(myDict, indent = <# of spaces>)

Usually, this is done through literal_eval which will take your string and parse it into a python dictionary :

import ast

# replace with full string of yours
s='{\n "data": {\n "event_type": "message.received",\n "id": "819"\n}\n}'

result = ast.literal_eval(s)

print(type(result), result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM