Progress

MongoDB Schema Validation with Shell and Node.js

Introduction: MongoDB is document-oriented and most popular NOSQL database. It provides flexible schema but for few applications this feature becomes problem and we need MongoDB schema validation.

MongoDB Schema Validation with Shell and Node.js

There are arguments on this topic whether we should use MongoDB schema validation or not. The reason for this is that NoSQL is meant to be flexible with schema and a fixed schema structure breaks this concept.

However, there are pros and cons of MongoDB validations which we are going to see.

What is Schema Validation?

Schema validation allows us to define a particular structure of the collection. Structure of collection means predefined fields with a fixed data type.

If we try to insert or update the collection which does not match the schema validation rules then MongoDB will throw an error or warning.

Why do We Need to Validate MongoDB Schema?

Flexible schema is very useful in some cases but not all. In the production environment, the database structure becomes stable and do not change frequently.

At this situation, we need perfect MongoDB schema validation to avoid any unstructured data entry which might break our application. It becomes even much more important when data is inserted from various sources (like API's).

Example:

Suppose we need to expect our schema to be like this:

{
  product: "apple",
  quantity: 1000,
  price: 12
}

And each time someone buys 10 apples then we will perform operation like quantity-10 to the database. But what will happen if any user inserted data like this?

{
  product: "apple",
  quantity: "Thousand",
  price: 12
}

Performing arithmetic operation will give unexpected results and our application functionality will break here.

Note: In Node.js "Thousand"-10 will evaluate as NaN instead of an error (the beauty of JavaScript :).

MongoDB Schema Validation Tutorial (in Shell)

Here begins the actual implementation of MongoDB schema validation.

MongoDB supports two types of schema validation.

  • Document validation (Introduced in version 3.2)
  • JSON schema validation (From version 3.6)
MongoDB officialy recommends JSON Schema validation.

JSON schema validation

In MongoDB type these commands.

use hc_tutorial;
db.createCollection("products", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "name", "price", "quantity" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            price: {
               bsonType: "number",
               minimum: 0,
               maximum: 10000000,
               description: "must be an integer in [ 0, 10000000 ] and is required"
            },
            quantity: {
               bsonType: "number",
               minimum: 0,
               description: "must be an integer greater then 0 and is required"
            },
            coupon: {
               bsonType: "string",
               description: "must be a string and is optional"
            },
         }
      }
   }
})

Here in the code, we create a new collection products which has the following validations.

  • Name as a string and is required.
  • Price as number type between 0 to 10000000 and is required.
  • Quantity as a number greater than 0 and is required.
  • Coupon as a string.

MongoDB JSON vs BSON

You might be thing what the heck is this in the middle of validations?

Well, you have just encountered an important core concept which is important to understand.

BSON is just binary representation and superset of JSON.

It means whatever you can do in JSON, it is possible to do in BSON. But BSON supports more features then JSON.

MongoDB stores the data in BSON format.

Don't be afraid of a new name. BSON is just JSON+ Some new features.

MongoDB schema validation data types

The valid data types for MongoDB schema validation are listed below. We divided these into two categories but don't confuse with them as you can use all data types mentioned here in the MongoDB validation.

JSON schema validation types (BSON supports these data types).

  • string
  • number
  • object
  • array
  • boolean
  • null

BSON specific data types.

  • objectId
  • binData for Binary data
  • bool
  • date
  • regex
  • javascript
  • javascriptWithScope
  • int
  • timestamp
  • decimal
  • minKey
  • maxKey

Reference for further reading.

Here are a few queries and their results.

> db.products.insertOne({name: "apple", price: 10, quantity: 1000, coupon: "FreeCoupon"});
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5edf59941ecdaecdc7e07d00")
}


> db.products.insertOne({name: "mango", price: "ten", quantity: 1000, coupon: "FreeCoupon"});
2020-06-09T15:20:14.087+0530 E QUERY    [thread1] WriteError: Document failed validation :
WriteError({
	"index" : 0,
	"code" : 121,
	"errmsg" : "Document failed validation",
	"op" : {
		"_id" : ObjectId("5edf5b561ecdaecdc7e07d01"),
		"name" : "mango",
		"price" : "ten",
		"quantity" : 1000,
		"coupon" : "FreeCoupon"
	}
})

The first query was successful because it passed the schema validation.

The second query contains the price of type string but a number was expected. Now this will prevent errors like the arithmetic operation error we have seen in the example.

Prevent additional field MongoDB

We do not anyone to insert any type of data in our database. To avoid this we can use schema validation for data types but what about additional fields.

The flexible schema will allow inserting any crap in the database and it will be a performance risk but may also create security issues.

We can be strict in our schema validation and avoid any other field insertion which is not defined in our schema validation.

Add the option additionalProperties: false to the $jsonSchema object.

Here is the code to prevent additional field insertion in MongoDB schema through validation.

 
use hc_tutorial;
db.createCollection("products", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "name", "price", "quantity" ],
         additionalProperties: false,
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            price: {
               bsonType: "number",
               minimum: 0,
               maximum: 10000000,
               description: "must be an integer in [ 0, 10000000 ] and is required"
            },
            quantity: {
               bsonType: "number",
               minimum: 0,
               description: "must be an integer greater then 0 and is required"
            },
            coupon: {
               bsonType: "string",
               description: "must be a string and is optional"
            },
         }
      }
   }
})

Now inserting any additional field other then those defined in the schema will throw an error.

> db.products.insertOne({name: "mango", price: 10, quantity: 1000, coupon: "FreeCoupon", extra: "Additional Field"});
2020-06-09T16:52:19.812+0530 E QUERY    [thread1] WriteError: Document failed validation :
WriteError({
	"index" : 0,
	"code" : 121,
	"errmsg" : "Document failed validation",
	"op" : {
		"_id" : ObjectId("5edf70eb1ecdaecdc7e07d06"),
		"name" : "mango",
		"price" : 10,
		"quantity" : 1000,
		"coupon" : "FreeCoupon",
		"extra" : "Additional Field"
	}
})

Here we try to insert additional field extra which is not in the schema definition so it throws an error.

Unique field validation in MongoDB schema

The unique field validation is not defined inside the validation rules.

We need to create an index of the fields with unique options to validate it should remain unique.

Here is a sample code to create an index which will provide unique field validation.

use hc_tutorial;

db.products.createIndex({name: 1}, {unique: true});

Here we create an index of the name field in the products collection. The number 1 with name field specifies to create indexing in ascending order.

The unique option ensures the name field to be unique, by default it is false.

Now if we try to insert two products with the same name then MongoDB will throw an error.

> db.products.insertOne({name: "orange", price: 10, quantity: 1000, coupon: "FreeCoupon"});
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5edf65151ecdaecdc7e07d03")
}
> db.products.insertOne({name: "orange", price: 10, quantity: 1000, coupon: "FreeCoupon"});
2020-06-09T16:01:52.425+0530 E QUERY    [thread1] WriteError: E11000 duplicate key error collection: hc_tutorial.products index: name_1 dup key: { : "orange" } :
WriteError({
	"index" : 0,
	"code" : 11000,
	"errmsg" : "E11000 duplicate key error collection: hc_tutorial.products index: name_1 dup key: { : \"orange\" }",
	"op" : {
		"_id" : ObjectId("5edf65181ecdaecdc7e07d04"),
		"name" : "orange",
		"price" : 10,
		"quantity" : 1000,
		"coupon" : "FreeCoupon"
	}
})

We try to insert two products with the same name and it throws an error, it means unique field validation is working correctly.

Document validation

This is only for reference purpose, you should go for JSON schema validation.

With the increase in the use of MongoDB among various types of applications, it became a SQL alternative for certain uses.

There was a rise in demand for a way to create a predictable schema which is important for application in production. Because it is very risky to expose the power of flexible schema to real-world users.

Therefore, MongoDB introduced document validation in version 3.2.

Here is the code for document validation.

use hc_tutorial;
db.createCollection("products", {
  validator: {
        $and: [
            {
                "name": {$type: "string", $exists: true}
            },
            {
                "price": {$type: "number", $exists: true}
            },
            {
                "quantity": {$type: "number", $exists: true}
            },
            {
                "coupon": {$type: "string"}
            }
        ]
    }
})

The above query will output { "ok" : 1 } on successful creation.

The $and is a logical query selector which defines to match all conditions.

Now we can validate if our schema is working fine as expected.

> db.products.insertOne({name: "mango", price: 10, quantity: 1000, coupon: "FreeCoupon"});
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5edfe0021ecdaecdc7e07d07")
}
> db.products.insertOne({name: "mango", price: "ten", quantity: 1000, coupon: "FreeCoupon"});
2020-06-10T00:46:32.459+0530 E QUERY    [thread1] WriteError: Document failed validation :
WriteError({
	"index" : 0,
	"code" : 121,
	"errmsg" : "Document failed validation",
	"op" : {
		"_id" : ObjectId("5edfe0101ecdaecdc7e07d08"),
		"name" : "mango",
		"price" : "ten",
		"quantity" : 1000,
		"coupon" : "FreeCoupon"
	}
})

The second query validation failed because the price field needs to be a number type. Now we can say that document validation works fine here.

You might be curious why document validation replaced with JSON schema validation.

The reason was that it was limited to validate only the fields we define in the schema validation. If a user tries to insert any extra field then there was no validation for that.

> db.products.insertOne({name: "mango", price: 10, quantity: 1000, coupon: "FreeCoupon", extra: "SomeAdditionalField"});
{
	"acknowledged" : true,
	"insertedId" : ObjectId("5edfe02e1ecdaecdc7e07d09")
}

Here extra field is also inserted without any validation.

It was also prone to typo errors, like we try to insert priec:"ten" (typo of price) then there will be no error. This situation can make our application database unpredictable

MongoDB Native Driver Schema Validation in Node.js

All the validation is the same as above, we just need to take care of the folder and files structure to organise the code.

Here are the folder and code structure which I use by learning through MongoDB University.

-controllers
-database
-middleware
-routes
-schema
index.js
server.js

Almost all files are specific to Node.js knowledge, I will try to focus on creating connection, database files and schema validation in Node.js MongoDB driver.

Here is our sample index.js file where we connect to MongoDB using its native driver in Node.js.

const app = require("./server"),
  MongoClient = require("mongodb").MongoClient;
require("dotenv").config();

var UsersDAO = require("./database/usersDAO");

const createUserSchema = require("./schema/user");

const port = process.env.PORT || 3001;
//Using 3000 for React.js :)

MongoClient.connect(process.env.DB_URL, {
  useNewUrlParser: true,
  useUnifiedTopology: true,
})
  .catch((err) => {
    console.error(err.stack);
    process.exit(1);
  })
  .then(async (client) => {
    createUserSchema(client);
    await UsersDAO.injectDB(client);
    app.listen(port, () => {
      console.log(`Server Started on port ${port}`);
    });
  });

Here we import the express app from server.js for clean code. The structure is simple, we start server only when the database connection is successful.

In the database folder, we put DAO files which stand for database access objects.

const {ObjectID} = require('mongodb'); 
let products;

class ProductsDAO {
  static async injectDB(conn) {
    if (products) {
      return;
    }
    try {
      products = await conn.db(process.env.DB_NAME).collection("products");
    } catch (e) {
      console.error(`Unable to establish collection handles in productDAO: ${e}`);
    }
  }
  /**
   * Add new product to database
   * @param {Object} product - The product object to insert.
   * @returns {Object | null} - Returns either a single user or null.
   */

  static async addProduct(product) {
    try {
      return await products.insertOne({ ...product });
    } catch (error) {
      console.log("Error");
      throw error;
    }
  }
}

module.exports = ProductsDAO;

Now we can simply import this file in the appropriate controller and use the methods to interact with the database.

For schema validation create product.js in schema folder.

const createProductSchema = async (conn) => {
  await conn.db(process.env.DB_NAME).createCollection("products", {
    storageEngine: {
      wiredTiger: {},
    },
    capped: false,
    validator: {
      $jsonSchema: {
        bsonType: "object",
        title: "products",
        additionalProperties: false,
        properties: {
          _id: {
            bsonType: "objectId",
          },
          name: {
            bsonType: "string",
          },
          price: {
            bsonType: "number",
          },
          quantity: {
            bsonType: "number",
          },
          coupon: {
            bsonType: "string",
          },
          
          ...
        },
        required: ["name", "quantity", "price"],
      },
    },
    validationLevel: "strict",
    validationAction: "error",
  });
  console.log("Product Schema Created");
  await conn
    .db(process.env.DB_NAME)
    .collection("products")
    .createIndex({ name: 1 }, { unique: true });
  console.log("Name index created for unique constraint");
};

module.exports = createProductSchema;

You can note in the index file we call this right before starting the application.

Another important thing which I will recommend is to use server-side validation before inserting data into the database. It may seem like over-validation but may improve performance depending on the architecture of your application.

If anything was confusing or you need a complete CRUD project with MongoDB native driver schema validation and better folder structure then let me know via the contact form.

Finally, I want to specify that there is no 'best folder structure'. It really depends on the size and type of application you are building. The only thing you need to remember is to use a common and conventional folder and file naming structure.

Summary

Here are some quick important points from the above article.

  • In a production environment, it is good practice to use schema validation for structured data.
  • MongoDB uses BSON instead of JSON.
  • BSON is a binary representation of JSON, we can assume BSON as JSON.
  • There is no direct unique validation, create an index of the field with unique option true for unique field validation.
  • It is good practice to implement database schema validation as well as server-side data validation.
Share the Post ;)

Related Posts