MongoDB : History tracking micro service

Mongo DB local -> oplog.$main collection keeps records of all changes which happen in the primary database, which are eventually read by secondaries to catch up data or syncup data in the mongo replica set.

A micro service can be written, which can pull up data / delta form oplog.$main based on the name space ( defined db and collection) and can save that data in destination Db or audit DB.

Later on audit DB can the queried to fetch the history or log data pertaining to entity.

audit log sequence diagram

MongoDB : Normalize Database reference (DBRefs)

The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.

This is not so much a “storage space” issue as it is a “data consistency” issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.

DBRef documents resemble the following document:

{ "$ref" : <value>, "$id" : <value>, "$db" : <value> }

Consider a document from a collection that stored a DBRef in a creator field:

  "_id" : ObjectId("5126bbf64aed4daf9e2ab771"),
  // .. application fields
  "creator" : {
                  "$ref" : "creators",
                  "$id" : ObjectId("5126bc054aed4daf9e2ab772"),
                  "$db" : "users"

The DBRef in this example points to a document in the creators collection of the users database that has ObjectId("5126bc054aed4daf9e2ab772") in its _id field.

Consider the following operation to insert two documents, using the _id field of the first document as a reference in the second document:

original_id = ObjectId()

    "_id": original_id,
    "name": "Broadway Center",
    "url": ""

    "name": "Erin",
    "places_id": original_id,
    "url":  ""

Then, when a query returns the document from the people collection you can, if needed, make a second query for the document referenced by the places_id field in the places collection.

Reference link for details :

MongoDB : How to simulate joins or sub query in no SQL MongoDB

No sql MongoDb by its very nature did not support joins and promotes to embed documents, however with new version 3.2 of mongoDb they have an alternative of joins which can be used in aggregation.

For a better performing design you can keep all related data in one big document (embedded) but for some security reasons if you have to keep it separate, then normalization is good idea.

Mongodb does not allow for sub queries or joins, however that can be simulated for example you employee table and instead of embedding salary details in employee document you kept in salary collection and you want to fetch,

SQL query : select salary from salary where employee_id = (select employee_id from employee where employee name like ‘premaseem’)

# Inserted 2 records in 2 different collections for join
> db.employee.insert({eid:1,name:”premaseem”})
WriteResult({ “nInserted” : 1 })
> db.salary.insert({ eid:1, salary:6000 })
WriteResult({ “nInserted” : 1 })

# Validated data in 2 tables
> db.salary.find({ eid:1})
{ “_id” : ObjectId(“56da1a5b2253b2199c53025b”), “eid” : 1, “salary” : 6000 }
> db.salary.find({ eid: db.employee.find({eid:1}) })
> db.employee.find({name : “prem” })
{ “_id” : ObjectId(“56da19d42253b2199c53025a”), “eid” : 1, “name” : “prem” }

#simulated join to get salary for employee premaseem
> db.employee.find({name : “premaseem” }).map(function(d){
var obj = db.salary.findOne({eid : d.eid });
return obj.salary;
} )
Output : 6000



Here is the python script to try out same

__author__ = 'premaseem'

from pymongo import Connection
c = Connection()
db = c.test


obj1 = {"eid":1,"name":"premaseem"}
obj2 = {"eid":2,"name":"sony"}
obj3 = {"eid":3,"name":"meera"}
bulk_employee_insert = [obj1,obj2,obj3]

# insert salary

objs1 = {"eid":1,"salary":1000}
objs2 = {"eid":2,"salary":8000}
objs3 = {"eid":3,"salary":25}
bulk_salary_insert = [objs1,objs2,objs3]


print str(db.employee.count()) + str("total employee")
print str(db.employeeSalary.count()) + str("total salary")

def find_employee() :
    emp_obj = db.employee.find_one({"eid":1})
    print emp_obj

def find_employee_with_joined_salary(eid) :
    emp_obj = db.employee.find_one({"eid":eid})
    emp_sal_obj = db.employeeSalary.find_one({"eid":eid})
    emp_obj["salary"] = emp_sal_obj["salary"]
    print emp_obj


MongoDB 3.2 has come up with $lookup which helps to join in aggregation. For reference follow links



Reference : MongoDB and the Shocking Case of the Missing JOIN ($lookup)

MongoDB doc :



MongoDB : How to Scale MongoDB Datasets

As time goes and data grows it is required to scale any database to maintain the quick response time and performance. Initially we run query explain plan and make changes in the document structure to make them efficient and add/modify required indexes. However even after performing clean up and maintenance tasks, the time comes to scale and ensure availability, durability and fault tolerance.


First, we scale vertically

MongoDB, like most databases, craves RAM and IO capacity. It sometimes likes CPU. The simplest conceptual way of scaling performance for a MongoDB dataset is to give it more system resources without worrying about spreading the load across servers. Normally, this is painful for cost or operational reasons. Doubling the capacity of a production MongoDB replica set means swapping larger servers in, figuring out what to do with the old ones, and hoping the new ones are just the right size to keep things healthy for a long period of time.

Then, we scale out

At this point, scaling out MongoDB is easy. A well-built, sharded MongoDB dataset is easy to reason about and will scale linearly across other servers. The sharding needs the key to divide the data across several small / commodity servers grouped in a cluster.


Flexibility is a requirement of an evolving data set

MongoDB offers numerous features that make developers lives easier. It also offers features for scale. Using the scaling features at the wrong time means compromising on developer-friendly features (unique constraints, oplog usefulness, capped collections). There is a great deal of pressure on developers to use the MongoDB sharding features even when they’re not necessary, which makes their lives worse in aggregate. The most healthy MongoDB setups started with developers using features that helped them move faster, and evolved as understanding of the problem scope and appropriate scale increased.

For developers that use MongoDB, they should make smart decisions and don’t force themselves down a path before they even have a map. We say inspect and adopt in Agile 😉


MongoDB : How to make query result look nice in mongo shell

Whenever we do find() query in mongoDB collection the shell is filled up with too much of data which is difficult to understand and ugly to look.


> db.devices.find()
{ "_id" : ObjectId("55ff79deb86a4b0eb1110ba4"), "time_allocated" : null,  "is_allocated" : false, "aggr_zone" : "xxx", "dc" : "DFW1", "time_suspended" : null, "device_type" : "server", "core_template_id" : "12345", "device_swapped_to" : null, "is_suspended" : false, "time_created" : "Sun Sep 20 22:30:38 2015", "device_id" : 48080, "is_decommed" : false }
{ "_id" : ObjectId("55ff79deb86a4b0eb1110ba5"), "time_allocated" : null,  "is_allocated" : false, "aggr_zone" : "xxx", "dc" : "ORD1", "time_suspended" : null, "device_type" : "server", "core_template_id" : "54321", "device_swapped_to" : null, "is_suspended" : false, "time_created" : "Sun Sep 20 22:30:38 2015", "device_id" : 45244, "is_decommed" : false }


To make the query result data look nicer there are 2 ways.
1. Use Pretty()

> db.devices.find().pretty()
	"_id" : ObjectId("55ff79deb86a4b0eb1110ba4"),
	"time_allocated" : null,
	"is_allocated" : false,
	"aggr_zone" : "xxx",
	"dc" : "DFW1",
	"time_suspended" : null,
	"device_type" : "server",
	"core_template_id" : "12345",
	"device_swapped_to" : null,
	"is_suspended" : false,
	"time_created" : "Sun Sep 20 22:30:38 2015",
	"device_id" : 48080,
	"is_decommed" : false
	"_id" : ObjectId("55ff79deb86a4b0eb1110ba5"),
	"time_allocated" : null,
	"is_allocated" : false,
	"aggr_zone" : "xxx",
	"dc" : "ORD1",
	"time_suspended" : null,
	"device_type" : "server",
	"core_template_id" : "54321",
	"device_swapped_to" : null,
	"is_suspended" : false,
	"time_created" : "Sun Sep 20 22:30:38 2015",
	"device_id" : 45244,
	"is_decommed" : false

Note : This would give your iterator, which mean you need to type "it" for next 20 records.
2. Use to Array

> db.devices.find().limit(2).toArray()
“_id” : ObjectId(“55ff79deb86a4b0eb1110ba4”),
“time_allocated” : null,
“is_allocated” : false,
“aggr_zone” : “xxx”,
“dc” : “DFW1”,
“time_suspended” : null,
“device_type” : “server”,
“core_template_id” : “12345”,
“device_swapped_to” : null,
“is_suspended” : false,
“time_created” : “Sun Sep 20 22:30:38 2015”,
“device_id” : 48080,
“is_decommed” : false
“_id” : ObjectId(“55ff79deb86a4b0eb1110ba5”),
“time_allocated” : null,
“is_allocated” : false,
“aggr_zone” : “xxx”,
“dc” : “ORD1”,
“time_suspended” : null,
“device_type” : “server”,
“core_template_id” : “54321”,
“device_swapped_to” : null,
“is_suspended” : false,
“time_created” : “Sun Sep 20 22:30:38 2015”,
“device_id” : 45244,
“is_decommed” : false

Note : This will display all records in shell, so use limit. Iterator is not given, so you might get any records mentioned unless you give sorting option.


MongoDB : Convert Oplog timestamp into ISO date

Most easy way to figure out the recent changes (any DB any collection) in mongo Database is to query to Oplog with sorting on timestamp

> db.oplog.$main.find().sort({“ts”:-1}).limit(5)

Output : { “ts” : Timestamp(1457990841, 1), “op” : “i”, “ns” : “nis.devices”, “o” : { “_id” : ObjectId(“56e72cb9389c38c2aeeab698”), “name” : “premaseemmarch” } }

This would show 5 recent records in any collection operated in the database.(we just sorted by time stamp in reverse order) To verify the the date we might have to convert the Timestamp in ISO date. -> “ts” : Timestamp(1457990841, 1)

Way 1 : 

> new Date(1000* 1457990841)


Way 2 : 

> x = Timestamp(1457989386, 1)

Timestamp(1457989386, 1)

> new Date(x.t * 1000)

MongoDb Tutorial by Premaseem (Free video course on MongoDB)

Hi Friends,

I am certified MongoDb expert. I though to share my knowledge with all, so that this entire world might get benefitted by it. It’s a free youtube video series with short lecture  covering different topics (31 topics).

Free video tutorial link :

mongoDB tutorial on youtube.png

Use this free tutorial and share it with friends 🙂

Certification : MongoDB for Developers – M101P

Hi Friends,

I feel pleasure to share the news that I have successfully finished MongoDB certification course for developers(by MongoDB university) with 90 % marks. MongoDB is a No-sql/Object database and it took me about 2 months to finish it. I am thankful to my friends, family and team mates for their motivation and support during the course and exams.

MongoDB Dev Certificate

Certificate link


MongoDB : Script to run Sharding with replica set on local machine

This simple script will help to you run sharing with multiple replica set on your local box. This makes very cool and uplifting, (if on linux use sudo / root to run shell script or commands manually. )


# MongoDB
# script to start a sharded environment on localhost

# clean everything up
echo "killing mongod and mongos"
killall mongod
killall mongos
echo "removing data files"
rm -rf /data/config
rm -rf /data/shard*

# start a replica set and tell it that it will be shard0
echo "starting servers for shard 0"
mkdir -p /data/shard0/rs0 /data/shard0/rs1 /data/shard0/rs2
mongod --replSet s0 --logpath "s0-r0.log" --dbpath /data/shard0/rs0 --port 37017 --fork --shardsvr --smallfiles
mongod --replSet s0 --logpath "s0-r1.log" --dbpath /data/shard0/rs1 --port 37018 --fork --shardsvr --smallfiles
mongod --replSet s0 --logpath "s0-r2.log" --dbpath /data/shard0/rs2 --port 37019 --fork --shardsvr --smallfiles

sleep 5
# connect to one server and initiate the set
echo "Configuring s0 replica set"
mongo --port 37017 << 'EOF'
config = { _id: "s0", members:[
          { _id : 0, host : "localhost:37017" },
          { _id : 1, host : "localhost:37018" },
          { _id : 2, host : "localhost:37019" }]};

# start a replicate set and tell it that it will be a shard1
echo "starting servers for shard 1"
mkdir -p /data/shard1/rs0 /data/shard1/rs1 /data/shard1/rs2
mongod --replSet s1 --logpath "s1-r0.log" --dbpath /data/shard1/rs0 --port 47017 --fork --shardsvr --smallfiles
mongod --replSet s1 --logpath "s1-r1.log" --dbpath /data/shard1/rs1 --port 47018 --fork --shardsvr --smallfiles
mongod --replSet s1 --logpath "s1-r2.log" --dbpath /data/shard1/rs2 --port 47019 --fork --shardsvr --smallfiles

sleep 5

echo "Configuring s1 replica set"
mongo --port 47017 << 'EOF'
config = { _id: "s1", members:[
          { _id : 0, host : "localhost:47017" },
          { _id : 1, host : "localhost:47018" },
          { _id : 2, host : "localhost:47019" }]};

# start a replicate set and tell it that it will be a shard2
echo "starting servers for shard 2"
mkdir -p /data/shard2/rs0 /data/shard2/rs1 /data/shard2/rs2
mongod --replSet s2 --logpath "s2-r0.log" --dbpath /data/shard2/rs0 --port 57017 --fork --shardsvr --smallfiles
mongod --replSet s2 --logpath "s2-r1.log" --dbpath /data/shard2/rs1 --port 57018 --fork --shardsvr --smallfiles
mongod --replSet s2 --logpath "s2-r2.log" --dbpath /data/shard2/rs2 --port 57019 --fork --shardsvr --smallfiles

sleep 5

echo "Configuring s2 replica set"
mongo --port 57017 << 'EOF'
config = { _id: "s2", members:[
          { _id : 0, host : "localhost:57017" },
          { _id : 1, host : "localhost:57018" },
          { _id : 2, host : "localhost:57019" }]};

# now start 3 config servers
echo "Starting config servers"
mkdir -p /data/config/config-a /data/config/config-b /data/config/config-c
mongod --logpath "cfg-a.log" --dbpath /data/config/config-a --port 57040 --fork --configsvr --smallfiles
mongod --logpath "cfg-b.log" --dbpath /data/config/config-b --port 57041 --fork --configsvr --smallfiles
mongod --logpath "cfg-c.log" --dbpath /data/config/config-c --port 57042 --fork --configsvr --smallfiles

# now start the mongos on a standard port
mongos --logpath "mongos-1.log" --configdb localhost:57040,localhost:57041,localhost:57042 --fork
echo "Waiting 60 seconds for the replica sets to fully come online"
sleep 60
echo "Connnecting to mongos and enabling sharding"

# add shards and enable sharding on the test db
mongo <<'EOF'
db.adminCommand( { addshard : "s0/"+"localhost:37017" } );
db.adminCommand( { addshard : "s1/"+"localhost:47017" } );
db.adminCommand( { addshard : "s2/"+"localhost:57017" } );
db.adminCommand({enableSharding: "school"})
db.adminCommand({shardCollection: "school.students", key: {student_id:1}});



Newer mongodb versions doesn’t support mirrored config servers. Thus, you need to setup them as an ordinary replSet. To do so, replace lines 71-73 by these:

mongod –replSet cs –logpath “cfg-a.log” –dbpath /data/config/config-a –port 57040 –fork –configsvr –smallfiles
mongod –replSet cs –logpath “cfg-b.log” –dbpath /data/config/config-b –port 57041 –fork –configsvr –smallfiles
mongod –replSet cs –logpath “cfg-c.log” –dbpath /data/config/config-c –port 57042 –fork –configsvr –smallfiles

And add thereafter these lines:

echo “Configuring config servers”
mongo –port 57040 << ‘EOF’
config = { _id: “cs”, members:[
{ _id : 0, host : “localhost:57040” },
{ _id : 1, host : “localhost:57041” },
{ _id : 2, host : “localhost:57042” }]};

Finally, replace line 86 by this one:

mongos –logpath “mongos-1.log” –configdb cs/localhost:57040,localhost:57041,localhost:57042 –fork

How to import data file in mongoDb

# Run MongoD process and file below

MQ428GG8WL:week2 asee2278$ mongoimport –db students –collection grades –file grades.json

MQ428GG8WL:week2 asee2278$  mongoimport --db users --collection contacts --type csv --headerline --file /opt/backups/contacts.csv

Note : Do not fire this command in mongo shell, fire it on bash with directory where your json or import file eg grades.json is there. CSV


Avoid using mongoimport and mongoexport for full instance production backups. They do not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON. Use mongodump and mongorestore as described in MongoDB Backup Methods for this kind of functionality.