简体   繁体   中英

Best Practices: Save Empty Fields as null or omit Field Completely and Manage Missing Fields in Code?

Given collection that will have 50+ million documents, and every document will have a maximum number of fields (shown in option a) What is the best practice for dealing with fields that can be null/sparse?

a)Is it better to save every document with the same fields and empty fields as null?

{
  "_id": "54ca5b234d2dfeba4f9ab613",
  "person_id": 1,
  "person_name": "Bob",
  "office_phone": null,
  "description": "This is where the description is entered",
  "technical_description": null

}

b)Or is it better to leave out the fields if they have no data?

{
  "_id": "54ca5b234d2dfeba4f9ab613",
  "person_id": 1,
  "person_name": "Bob",
  "description": "This is where the description is entered",
}

What are the considerations in keeping fields with null in terms of mongod ram usage and performance? Essentially omitting fields that are empty would create a situation where the application needs to implement a way to check if fields exist. What considerations should there be at the application level if delegating this "empty check" to code? Is it heavy? Will omitting fields in the collection if they are empty just delegate the heaviness to the client code layer?

My suggestion would be to leave them null and check for existence of the field. One disadvantage of document databases over relational databases is that a table only has to provide the name of a column one time while each document has to provide the name of every field in that document.

So considering that you have 50 million documents and the field name technical_description is approximately 20 characters long, there will be over 1 gigabyte of storage used just to account for that one key name in every document. If half of your documents have the value as null then that's at least half of a GB of utterly wasted space.

That space will be used in RAM if the document is loaded into memory. When an index is used, the index will be put into memory and then any documents that are sent to the client will need to be put into memory as they are asked for. Usually not all the documents are accessed at one time and MongoDB has built in pagination with a cursor object.

On the other side, I am not well acquainted with how heavy it is in Java to determine that an object does not have a certain field, but it seems like it should be a fairly light operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM