MongoDB : History tracking micro service

Mongo DB local -> oplog.$main collection keeps records of all changes which happen in the primary database, which are eventually read by secondaries to catch up data or syncup data in the mongo replica set.

A micro service can be written, which can pull up data / delta form oplog.$main based on the name space ( defined db and collection) and can save that data in destination Db or audit DB.

Later on audit DB can the queried to fetch the history or log data pertaining to entity.

audit log sequence diagram

Advertisements

MongoDB : Normalize Database reference (DBRefs)

The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.

This is not so much a “storage space” issue as it is a “data consistency” issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.

DBRef documents resemble the following document:

{ "$ref" : <value>, "$id" : <value>, "$db" : <value> }

Consider a document from a collection that stored a DBRef in a creator field:

{
  "_id" : ObjectId("5126bbf64aed4daf9e2ab771"),
  // .. application fields
  "creator" : {
                  "$ref" : "creators",
                  "$id" : ObjectId("5126bc054aed4daf9e2ab772"),
                  "$db" : "users"
               }
}

The DBRef in this example points to a document in the creators collection of the users database that has ObjectId("5126bc054aed4daf9e2ab772") in its _id field.

Consider the following operation to insert two documents, using the _id field of the first document as a reference in the second document:

original_id = ObjectId()

db.places.insert({
    "_id": original_id,
    "name": "Broadway Center",
    "url": "bc.example.net"
})

db.people.insert({
    "name": "Erin",
    "places_id": original_id,
    "url":  "bc.example.net/Erin"
})

Then, when a query returns the document from the people collection you can, if needed, make a second query for the document referenced by the places_id field in the places collection.

Reference link for details : https://docs.mongodb.com/manual/reference/database-references/

MongoDB : How to simulate joins or sub query in no SQL MongoDB

No sql MongoDb by its very nature did not support joins and promotes to embed documents, however with new version 3.2 of mongoDb they have an alternative of joins which can be used in aggregation.

For a better performing design you can keep all related data in one big document (embedded) but for some security reasons if you have to keep it separate, then normalization is good idea.

Mongodb does not allow for sub queries or joins, however that can be simulated for example you employee table and instead of embedding salary details in employee document you kept in salary collection and you want to fetch,

SQL query : select salary from salary where employee_id = (select employee_id from employee where employee name like ‘premaseem’)

# Inserted 2 records in 2 different collections for join
> db.employee.insert({eid:1,name:”premaseem”})
WriteResult({ “nInserted” : 1 })
> db.salary.insert({ eid:1, salary:6000 })
WriteResult({ “nInserted” : 1 })

# Validated data in 2 tables
> db.salary.find({ eid:1})
{ “_id” : ObjectId(“56da1a5b2253b2199c53025b”), “eid” : 1, “salary” : 6000 }
> db.salary.find({ eid: db.employee.find({eid:1}) })
> db.employee.find({name : “prem” })
{ “_id” : ObjectId(“56da19d42253b2199c53025a”), “eid” : 1, “name” : “prem” }

#simulated join to get salary for employee premaseem
> db.employee.find({name : “premaseem” }).map(function(d){
var obj = db.salary.findOne({eid : d.eid });
print(obj.salary);
return obj.salary;
} )
Output : 6000

 

 

Here is the python script to try out same

__author__ = 'premaseem'

from pymongo import Connection
c = Connection()
db = c.test

db.employee.drop()
db.employeeSalary.drop()

db.test.insert({"name":"premaseem"})
obj1 = {"eid":1,"name":"premaseem"}
obj2 = {"eid":2,"name":"sony"}
obj3 = {"eid":3,"name":"meera"}
bulk_employee_insert = [obj1,obj2,obj3]

# insert salary
db.employee.insert(bulk_employee_insert)

objs1 = {"eid":1,"salary":1000}
objs2 = {"eid":2,"salary":8000}
objs3 = {"eid":3,"salary":25}
bulk_salary_insert = [objs1,objs2,objs3]

db.employeeSalary.insert(bulk_salary_insert)

print str(db.employee.count()) + str("total employee")
print str(db.employeeSalary.count()) + str("total salary")

def find_employee() :
    emp_obj = db.employee.find_one({"eid":1})
    print emp_obj

def find_employee_with_joined_salary(eid) :
    emp_obj = db.employee.find_one({"eid":eid})
    emp_sal_obj = db.employeeSalary.find_one({"eid":eid})
    emp_obj["salary"] = emp_sal_obj["salary"]
    print emp_obj

find_employee_with_joined_salary(2)

MongoDB 3.2 has come up with $lookup which helps to join in aggregation. For reference follow links

 

 

Reference : MongoDB and the Shocking Case of the Missing JOIN ($lookup)

MongoDB doc : https://docs.mongodb.org/manual/reference/operator/aggregation/lookup/#example

 

 

MongoDB : How to Scale MongoDB Datasets

As time goes and data grows it is required to scale any database to maintain the quick response time and performance. Initially we run query explain plan and make changes in the document structure to make them efficient and add/modify required indexes. However even after performing clean up and maintenance tasks, the time comes to scale and ensure availability, durability and fault tolerance.

scaleMongo

First, we scale vertically

MongoDB, like most databases, craves RAM and IO capacity. It sometimes likes CPU. The simplest conceptual way of scaling performance for a MongoDB dataset is to give it more system resources without worrying about spreading the load across servers. Normally, this is painful for cost or operational reasons. Doubling the capacity of a production MongoDB replica set means swapping larger servers in, figuring out what to do with the old ones, and hoping the new ones are just the right size to keep things healthy for a long period of time.

Then, we scale out

At this point, scaling out MongoDB is easy. A well-built, sharded MongoDB dataset is easy to reason about and will scale linearly across other servers. The sharding needs the key to divide the data across several small / commodity servers grouped in a cluster.

shardingExample

Flexibility is a requirement of an evolving data set

MongoDB offers numerous features that make developers lives easier. It also offers features for scale. Using the scaling features at the wrong time means compromising on developer-friendly features (unique constraints, oplog usefulness, capped collections). There is a great deal of pressure on developers to use the MongoDB sharding features even when they’re not necessary, which makes their lives worse in aggregate. The most healthy MongoDB setups started with developers using features that helped them move faster, and evolved as understanding of the problem scope and appropriate scale increased.

For developers that use MongoDB, they should make smart decisions and don’t force themselves down a path before they even have a map. We say inspect and adopt in Agile 😉

 

MongoDB : How to make query result look nice in mongo shell

Whenever we do find() query in mongoDB collection the shell is filled up with too much of data which is difficult to understand and ugly to look.

 

> db.devices.find()
{ "_id" : ObjectId("55ff79deb86a4b0eb1110ba4"), "time_allocated" : null,  "is_allocated" : false, "aggr_zone" : "xxx", "dc" : "DFW1", "time_suspended" : null, "device_type" : "server", "core_template_id" : "12345", "device_swapped_to" : null, "is_suspended" : false, "time_created" : "Sun Sep 20 22:30:38 2015", "device_id" : 48080, "is_decommed" : false }
{ "_id" : ObjectId("55ff79deb86a4b0eb1110ba5"), "time_allocated" : null,  "is_allocated" : false, "aggr_zone" : "xxx", "dc" : "ORD1", "time_suspended" : null, "device_type" : "server", "core_template_id" : "54321", "device_swapped_to" : null, "is_suspended" : false, "time_created" : "Sun Sep 20 22:30:38 2015", "device_id" : 45244, "is_decommed" : false }


 

To make the query result data look nicer there are 2 ways.
1. Use Pretty()

> db.devices.find().pretty()
{
	"_id" : ObjectId("55ff79deb86a4b0eb1110ba4"),
	"time_allocated" : null,
	"is_allocated" : false,
	"aggr_zone" : "xxx",
	"dc" : "DFW1",
	"time_suspended" : null,
	"device_type" : "server",
	"core_template_id" : "12345",
	"device_swapped_to" : null,
	"is_suspended" : false,
	"time_created" : "Sun Sep 20 22:30:38 2015",
	"device_id" : 48080,
	"is_decommed" : false
}
{
	"_id" : ObjectId("55ff79deb86a4b0eb1110ba5"),
	"time_allocated" : null,
	"is_allocated" : false,
	"aggr_zone" : "xxx",
	"dc" : "ORD1",
	"time_suspended" : null,
	"device_type" : "server",
	"core_template_id" : "54321",
	"device_swapped_to" : null,
	"is_suspended" : false,
	"time_created" : "Sun Sep 20 22:30:38 2015",
	"device_id" : 45244,
	"is_decommed" : false
}

Note : This would give your iterator, which mean you need to type "it" for next 20 records.
2. Use to Array

> db.devices.find().limit(2).toArray()
[
{
“_id” : ObjectId(“55ff79deb86a4b0eb1110ba4”),
“time_allocated” : null,
“is_allocated” : false,
“aggr_zone” : “xxx”,
“dc” : “DFW1”,
“time_suspended” : null,
“device_type” : “server”,
“core_template_id” : “12345”,
“device_swapped_to” : null,
“is_suspended” : false,
“time_created” : “Sun Sep 20 22:30:38 2015”,
“device_id” : 48080,
“is_decommed” : false
},
{
“_id” : ObjectId(“55ff79deb86a4b0eb1110ba5”),
“time_allocated” : null,
“is_allocated” : false,
“aggr_zone” : “xxx”,
“dc” : “ORD1”,
“time_suspended” : null,
“device_type” : “server”,
“core_template_id” : “54321”,
“device_swapped_to” : null,
“is_suspended” : false,
“time_created” : “Sun Sep 20 22:30:38 2015”,
“device_id” : 45244,
“is_decommed” : false
}
]
>

Note : This will display all records in shell, so use limit. Iterator is not given, so you might get any records mentioned unless you give sorting option.


			

MongoDB : Convert Oplog timestamp into ISO date

Most easy way to figure out the recent changes (any DB any collection) in mongo Database is to query to Oplog with sorting on timestamp

> db.oplog.$main.find().sort({“ts”:-1}).limit(5)

Output : { “ts” : Timestamp(1457990841, 1), “op” : “i”, “ns” : “nis.devices”, “o” : { “_id” : ObjectId(“56e72cb9389c38c2aeeab698”), “name” : “premaseemmarch” } }

This would show 5 recent records in any collection operated in the database.(we just sorted by time stamp in reverse order) To verify the the date we might have to convert the Timestamp in ISO date. -> “ts” : Timestamp(1457990841, 1)

Way 1 : 

> new Date(1000* 1457990841)

ISODate(“2016-03-14T21:27:21Z”)

Way 2 : 

> x = Timestamp(1457989386, 1)

Timestamp(1457989386, 1)

> new Date(x.t * 1000)

MongoDb Tutorial by Premaseem (Free video course on MongoDB)

Hi Friends,

I am certified MongoDb expert. I though to share my knowledge with all, so that this entire world might get benefitted by it. It’s a free youtube video series with short lecture  covering different topics (31 topics).

Free video tutorial link :  https://www.youtube.com/playlist?list=PL13Vva6TJcSsAFUsZwYpJOfR-ENWypLAe

mongoDB tutorial on youtube.png

Use this free tutorial and share it with friends 🙂