简体   繁体   中英

How to remove all the “$oid” and "$date" in a .json file?

I have a.json file saved in my computer that contains things like $oid or $date which will later cause me trouble in BigQuery. For example:

{
  "_id": {
  "$oid": "5e7511c45cb29ef48b8cfcff"
  },
  "about": "some text",
  "creationDate": {
  "$date": "2021-01-05T14:59:58.046Z"
  }
}

I want it to look like (so it's not just removing some letters from the string):

{
  "_id": "5e7511c45cb29ef48b8cfcff",
  "about": "some text",
  "creationDate": "2021-01-05T14:59:58.046Z"
}

With Pymongo, one can do something like:

my_file['id']=my_file['id']['$oid']
my_file['creationDate']=my_file['creationDate']['$date']

How would this look without using Pymongo, since I want to first find such keys and remove all the problematic $oid or $date ?

Edit: sorry for the bad wording, what I meant to say was whether it was possible to find the keys that contain these problematic $ without writing down every key in the dictionary. In reality, there are more files with huge tables and many of them can contain this.

I would try something as shown below.

import json
file = open('data.json','r')
data = json.load(file)
for k,v in data.items():
    #check if key has dict value
    if type(v) == dict:
        #find id with $
        r = list(data[k].keys())[0]
        #change value if $ occurs
        if r[0] == '$':
            data[k] = data[k][r]
print(data)

seems like we get this output.

{'_id': '5e7511c45cb29ef48b8cfcff', 'about': 'some text', 'creationDate': '2021-01-05T14:59:58.046Z'}

The $oid and $date fields appear when you use the default encoder using bson.json_util.dumps() .

If you have control over where these files come from, you might want to fix the "problem" at source rather than having to code around it. The following code snippet shows how you can implement a custom encoder to format the output how you need it:

import json
import datetime
from pymongo import MongoClient


class MyJsonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime.datetime):
            return obj.isoformat()
        if hasattr(obj, '__str__'):  # This will handle ObjectIds
            return str(obj)

        return super(MyJsonEncoder, self).default(obj)


db = MongoClient()['mydatabase']
db.mycollection.insert_one({'Date': datetime.datetime.now()})
record = db.mycollection.find_one()
print(json.dumps(record, indent=4, cls=MyJsonEncoder))

prints:

{
    "_id": "60a55e3cea5bf57c79177871",
    "Date": "2021-05-19T19:51:40.808000"
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM