简体   繁体   English

如何删除 a.json 文件中的所有“$oid”和“$date”?

[英]How to remove all the “$oid” and "$date" in a .json file?

I have a.json file saved in my computer that contains things like $oid or $date which will later cause me trouble in BigQuery.我有一个 .json 文件保存在我的计算机中,其中包含$oid$date类的内容,稍后会在 BigQuery 中给我带来麻烦。 For example:例如:

{
  "_id": {
  "$oid": "5e7511c45cb29ef48b8cfcff"
  },
  "about": "some text",
  "creationDate": {
  "$date": "2021-01-05T14:59:58.046Z"
  }
}

I want it to look like (so it's not just removing some letters from the string):我希望它看起来像(所以它不仅仅是从字符串中删除一些字母):

{
  "_id": "5e7511c45cb29ef48b8cfcff",
  "about": "some text",
  "creationDate": "2021-01-05T14:59:58.046Z"
}

With Pymongo, one can do something like:使用 Pymongo,可以执行以下操作:

my_file['id']=my_file['id']['$oid']
my_file['creationDate']=my_file['creationDate']['$date']

How would this look without using Pymongo, since I want to first find such keys and remove all the problematic $oid or $date ?如果不使用 Pymongo,这会怎样,因为我想首先找到这样的键并删除所有有问题的$oid$date

Edit: sorry for the bad wording, what I meant to say was whether it was possible to find the keys that contain these problematic $ without writing down every key in the dictionary.编辑:抱歉措辞不好,我的意思是是否有可能找到包含这些有问题的键 $ 而无需写下字典中的每个键。 In reality, there are more files with huge tables and many of them can contain this.实际上,有更多带有大表的文件,其中许多可以包含这个。

I would try something as shown below.我会尝试如下所示的方法。

import json
file = open('data.json','r')
data = json.load(file)
for k,v in data.items():
    #check if key has dict value
    if type(v) == dict:
        #find id with $
        r = list(data[k].keys())[0]
        #change value if $ occurs
        if r[0] == '$':
            data[k] = data[k][r]
print(data)

seems like we get this output.好像我们得到了这个 output。

{'_id': '5e7511c45cb29ef48b8cfcff', 'about': 'some text', 'creationDate': '2021-01-05T14:59:58.046Z'}

The $oid and $date fields appear when you use the default encoder using bson.json_util.dumps() .当您使用bson.json_util.dumps()使用默认编码器时,会出现$oid$date字段。

If you have control over where these files come from, you might want to fix the "problem" at source rather than having to code around it.如果您可以控制这些文件的来源,您可能希望从源头解决“问题”,而不必围绕它编写代码。 The following code snippet shows how you can implement a custom encoder to format the output how you need it:下面的代码片段显示了如何实现自定义编码器来格式化 output 的格式:

import json
import datetime
from pymongo import MongoClient


class MyJsonEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime.datetime):
            return obj.isoformat()
        if hasattr(obj, '__str__'):  # This will handle ObjectIds
            return str(obj)

        return super(MyJsonEncoder, self).default(obj)


db = MongoClient()['mydatabase']
db.mycollection.insert_one({'Date': datetime.datetime.now()})
record = db.mycollection.find_one()
print(json.dumps(record, indent=4, cls=MyJsonEncoder))

prints:印刷:

{
    "_id": "60a55e3cea5bf57c79177871",
    "Date": "2021-05-19T19:51:40.808000"
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM