Progress

MongoDB Remove Duplicates - Unique MongoDB Validation

Introduction: It is easy in MongoDB to remove duplicates using the script explained in this article. There is no native solution available for this. To create a unique field validation in MongoDB you need to create index of that field with unique validation.

MongoDB Remove Duplicates - Unique MongoDB Validation

MongoDB is a popular NoSQL database and best option for unstructured data. But in the production environment it is important to have some validation constraint to avoid breaking of application.

MongoDB Remove Duplicates Using Script

It is easy to create unique validation in MongoDB by index. But how can you ensure uniqueness of value in a pre-existing collection? It is not easy in MongoDB to remove duplicate values.

In older versions of MongoDB, it was easy to do using dropdupes to remove duplicates from MongoDB which deprecated in version 2.7.5.

Here is the syntax if you are using an older version of MongoDB before 2.7.5 (I don't know why you are using it).

db.collection.ensureIndex({someField:1}, {unique:true, dropDups:true})

Note: It is DEPRECATED.

The reason for which MongoDB remove this option is that it was not sure that which documents it will remove.

Currently, there is no native solution available to remove duplicates from an existing collection in MongoDB. But I have created a script which can do the job.

Download the Node.js script to remove duplicates in MongoDB.

You need to install Node.js and NPM on your system to run this script.

  • Now inside the folder run the command npm installand it will download the necessary dependencies.
  • Edit .env file with your own configurations.
  • Run this command in terminal node index.js.

Important things to remember about the script (MUST READ).

  • It creates a new collection with a temporary name.
  • Creates an index with unique validation on the specified field.
  • Clone the unique items of the original collection to the temporary collection ignoring duplicate errors.
  • Rename the original collection to temp_collection_review and temporary collection to the original collection name.

You should manually review, verify and then delete temp_collection_review collection.

Solution script inspired by a stack overflow answer.

Unique Field Validation in MongoDB

It is often confusing for beginners to create unique field validation in MongoDB.

MongoDB facilitates unique field validation by creating an index on that field with the unique option as true.

Here is the general syntax to create an index with unique field validation.

db.collectionName.createIndex({ fieldToBeUnique: 1}, {unique: true});

The number 1 after the fieldToBeUnique defines to sort the index in ascending order, for descending order it is -1.

For example.

db.products.createIndex({ name: 1}, {unique: true});

The above query will create indexing of name field and ensure that it remains unique.

Performance and space effect with indexing

The real purpose of creating an index is for faster accessing (searching the fields). It has its own effects on performance and space.

Each index occupies extra space but provides performance benefits for faster accessing of the data.

That is why it is important to perfectly plan and model your data before you start working on it.

Summary

Here are some bullet points for quick review.

  • MongoDB does not provide and native solution to remove duplicates.
  • You can use the script describes above to remove duplicates in MongoDB.
  • To validate a field to be unique, create an index of that field with a unique option as true.
  • Each index occupies extra space but provides performance benefits for data access.
Share the Post ;)

Related Posts