简体   繁体   中英

Mongo DB track DDL changes

I am new to Mongo DB.I came from RDBMS/MPP/ETL background and most of the Data stores I used have the metadata about the objects(tables view etc).My doubt is specific to Mongo DB .Does it have any data dictionaries like Oracle user_tables or any other meta information about collections last DDL updated time since MongoDB is a schema less DB, application can change the insert data without schema changes .So finding any structure change before running ETL jobs is important when there is MongoDb involved .I searched for dictionaries or any API information which is tracking the DDL change and found nothing.Can anyone guide me to the links or information related to this.If there is no options like this is there any better best practises to follow to avoid these kind of schema evolution

Thanks Anoop R

One amongst the advantage of using Mongodb is its schema less structure of storing documents. Now unlike RDBMS table dictionaries, the schema lives in the application layer for MongoDb users. That gives the flexibility to application to design/change schema whenever without waiting on any alter statement dependencies.

Having said that Mongodb 3.2 introduced schema validation and 3.4 enriched it. You can learn more about the validation here Mongodb document validation . Validation rules are specified on a per-collection basis using the validator option, which takes a document that specifies the validation rules or expressions.

A point to note about schema validation is not to track the ddl changes but to build an agreed upon definition so to speak.

I got a solution which is not actually I am trying for But I think we can manage using that .`

default checklist for data types

key_type_default_count = {
    int: 0,
    float: 0,
    str: 0,
    bool: 0,
    dict: 0,
    list: 0,
    set: 0,
    tuple: 0,
    None: 0,
    object: 0,
    unicode: 0,
    "other": 0,
}

custom code to get the mongo connection

client = create_mongo_con(v_env,v_con_name)
print client

db = client[v_db_name]
collection = db[v_collection]

main code

key_type_count = defaultdict(lambda: dict(key_type_default_count))


mongo_collection_docs = collection.find({},{"_id":0}).limit(30)
#print mongo_collection_docs'
print type(mongo_collection_docs)

for doc in mongo_collection_docs:

    for key, value in doc.items():
        print ' my key '+str(key)
        print 'my value is '+str(value)
        print ' my value type '
        print type(value)
        if type(value) in key_type_count[key].keys():
            key_type_count[key][type(value)] += 1
        else:
            key_type_count[key]["other"] += 1
    total_docs += 1`

You can refer more about this https://github.com/nimeshkverma/mongo_schema from where I got the idea but that code was not working for .I edited some of the part and now I am able to generate a pretty output like this 在此处输入图片说明

But now I am facing one issue with all string fields are detected as unicode.I need to figure this out will post If we got a solution.If anybody faced same issue with str and unicode in python please comment

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM