MongoDB : Exporting JSON with mongoexport

This will export a JSON representation of the database. Note that as a rule – particularly for backing up or moving data – MongoDB recommends the “dump and restore” approach as BSON can contain more rich data than JSON. Nevertheless, mongoexport still has its uses, sometimes a JSON representation of the data is very useful – it’s what we’ve been using so far in the application development.

Basic output to console

Need to specify the name of the database –db and the collection –collection to export. Restoring dumped data with MongoRestore Inserts only, no updates Exporting JSON with mongoexport

Basic output to console

> mongoexport –db meantest –collection tech

Send to a file

> mongoexport –db meantest –collection tech –out MEAN/api/data/tech.json

Create as array

> mongoexport –db meantest –collection tech –out MEAN/api/data/tech.json –jsonArray

Make output pretty

> mongoexport –db meantest –collection tech –out MEAN/api/data/tech.json –jsonArray –pretty

 

MongoDB : How to take export a database using mongo dump

We need to take database backup or have to export specific database in  mongo

This command will export specific databases in mongo and export to home folder /dump

> mongodump –db testDatabase

# to compress in zip file and folder

> mongodump –db testDatabase   –gzip

cd ~/dump

 

 

This command will restore or import specific db

>mongorestore –db testDatabase   –gzip  ~/dump/testDatabase

 

With auth you can take dump as

mongodump -h 10.10.10.10 -p 27017 -d dbName -u “user” -p “password” –excludeCollectionsWithPrefix=system

and restore as

mongorestore -d dbName –dir=dbDir

Note :

There command should be run from command prompt / shell and not from mongo sheel.

Mongo restroe does not update existing collections

MongoDB : Normalize Database reference (DBRefs)

The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.

This is not so much a “storage space” issue as it is a “data consistency” issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.

DBRef documents resemble the following document:

{ "$ref" : <value>, "$id" : <value>, "$db" : <value> }

Consider a document from a collection that stored a DBRef in a creator field:

{
  "_id" : ObjectId("5126bbf64aed4daf9e2ab771"),
  // .. application fields
  "creator" : {
                  "$ref" : "creators",
                  "$id" : ObjectId("5126bc054aed4daf9e2ab772"),
                  "$db" : "users"
               }
}

The DBRef in this example points to a document in the creators collection of the users database that has ObjectId("5126bc054aed4daf9e2ab772") in its _id field.

Consider the following operation to insert two documents, using the _id field of the first document as a reference in the second document:

original_id = ObjectId()

db.places.insert({
    "_id": original_id,
    "name": "Broadway Center",
    "url": "bc.example.net"
})

db.people.insert({
    "name": "Erin",
    "places_id": original_id,
    "url":  "bc.example.net/Erin"
})

Then, when a query returns the document from the people collection you can, if needed, make a second query for the document referenced by the places_id field in the places collection.

Reference link for details : https://docs.mongodb.com/manual/reference/database-references/

Certification : MongoDB for Database Adminstrators – M102

Hi Friends,

I feel pleasure to share that I have taken one more step towards MongoDb expertise by successfully finishing yet another certification course for Mongo DB as database administrator 😉 Previous certificate was M101P which was focused on developer’s aspect of mongoDb, this certificate is M102 – which was focused on database administrator’s aspect of MongoDb.
DBA_Database_Administrator_certificate_for_mongo_DB_Aseem_Jain_2016
It was a 2 month long course and I finished this with 90%.
This certificates enabled me to set up production level replica sets, shards, fail overs, indexes, performance tuneups, back up and recovery, data migration and several other complex tasks.

MongoDB : mongovue a windows desktop GUI client for easy no sql data visualization

http://www.mongovue.com/

MongoVUE is an innovative MongoDB desktop application for Windows OS that gives you an elegant and highly usable GUI interface to work with MongoDB. Now there is one less worry in managing your web-scale data.

MongoVUE makes it a very simple to see and visualize your data. It gives you 3 different views of it – TreeView, TableView and TextView. If you are from RDBMS (SQL) back ground you will feel at home with Table View of MongoDB (no sql).

 

Python : How to covert python dictionary into json and why we need that

Problem

You want to read or write data encoded as JSON (JavaScript Object Notation).

Solution

The json module provides an easy way to encode and decode data in JSON. The two main functions are json.dumps() and json.loads(), mirroring the interface used in other serialization libraries, such as pickle. Here is how you turn a Python data structure into JSON:

import json

data = {
   'name' : 'ACME',
   'shares' : 100,
   'price' : 542.23
}

json_str = json.dumps(data)

Here is how you turn a JSON-encoded string back into a Python data structure:

data = json.loads(json_str)

If you are working with files instead of strings, you can alternatively usejson.dump() and json.load() to encode and decode JSON data. For example:

# Writing JSON data
with open('data.json', 'w') as f:
     json.dump(data, f)

# Reading data back
with open('data.json', 'r') as f:
     data = json.load(f)

Dictionary and json are not same, so when dealing with web application in python we need to convert the python dictonay into json and visa vera. json.dumps and json.loads are used for the same.

# json dumps takes dict as input and return the json object as string.

# json loads takes a json object / string and returns the dict.

 

The sample console output , while trying to get value out on dumped dict :

>> import json
>>> a = {‘foo’: 3}
>>> json.dumps(a)
‘{“foo”: 3}’
>>> obj = json.dumps(a)
>>> print obj
{“foo”: 3}

>>> isinstance(obj,str)
True
>>> a[‘foo’]
3
>>> obj[‘foo’]
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
TypeError: string indices must be integers, not str
>>> dict = json.loads(obj)
>>> dict[‘foo’]
3

MongoDB : How to simulate joins or sub query in no SQL MongoDB

No sql MongoDb by its very nature did not support joins and promotes to embed documents, however with new version 3.2 of mongoDb they have an alternative of joins which can be used in aggregation.

For a better performing design you can keep all related data in one big document (embedded) but for some security reasons if you have to keep it separate, then normalization is good idea.

Mongodb does not allow for sub queries or joins, however that can be simulated for example you employee table and instead of embedding salary details in employee document you kept in salary collection and you want to fetch,

SQL query : select salary from salary where employee_id = (select employee_id from employee where employee name like ‘premaseem’)

# Inserted 2 records in 2 different collections for join
> db.employee.insert({eid:1,name:”premaseem”})
WriteResult({ “nInserted” : 1 })
> db.salary.insert({ eid:1, salary:6000 })
WriteResult({ “nInserted” : 1 })

# Validated data in 2 tables
> db.salary.find({ eid:1})
{ “_id” : ObjectId(“56da1a5b2253b2199c53025b”), “eid” : 1, “salary” : 6000 }
> db.salary.find({ eid: db.employee.find({eid:1}) })
> db.employee.find({name : “prem” })
{ “_id” : ObjectId(“56da19d42253b2199c53025a”), “eid” : 1, “name” : “prem” }

#simulated join to get salary for employee premaseem
> db.employee.find({name : “premaseem” }).map(function(d){
var obj = db.salary.findOne({eid : d.eid });
print(obj.salary);
return obj.salary;
} )
Output : 6000

 

 

Here is the python script to try out same

__author__ = 'premaseem'

from pymongo import Connection
c = Connection()
db = c.test

db.employee.drop()
db.employeeSalary.drop()

db.test.insert({"name":"premaseem"})
obj1 = {"eid":1,"name":"premaseem"}
obj2 = {"eid":2,"name":"sony"}
obj3 = {"eid":3,"name":"meera"}
bulk_employee_insert = [obj1,obj2,obj3]

# insert salary
db.employee.insert(bulk_employee_insert)

objs1 = {"eid":1,"salary":1000}
objs2 = {"eid":2,"salary":8000}
objs3 = {"eid":3,"salary":25}
bulk_salary_insert = [objs1,objs2,objs3]

db.employeeSalary.insert(bulk_salary_insert)

print str(db.employee.count()) + str("total employee")
print str(db.employeeSalary.count()) + str("total salary")

def find_employee() :
    emp_obj = db.employee.find_one({"eid":1})
    print emp_obj

def find_employee_with_joined_salary(eid) :
    emp_obj = db.employee.find_one({"eid":eid})
    emp_sal_obj = db.employeeSalary.find_one({"eid":eid})
    emp_obj["salary"] = emp_sal_obj["salary"]
    print emp_obj

find_employee_with_joined_salary(2)

MongoDB 3.2 has come up with $lookup which helps to join in aggregation. For reference follow links

 

 

Reference : MongoDB and the Shocking Case of the Missing JOIN ($lookup)

MongoDB doc : https://docs.mongodb.org/manual/reference/operator/aggregation/lookup/#example

 

 

What are ACID properties of Database

When talking databases that handle mission-critical business transactions and information you are talking ACID features. There are a set of properties that guarantee that database transactions are processed reliably, referred to as ACID (Atomicity, Consistency, Isolation, Durability)

acid_db

Atomicity requires that each transaction be “all or nothing”: if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible (“atomic”), and an aborted transaction does not happen.

Consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors cannot result in the violation of any defined rules.

Isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other. Providing isolation is the main goal of concurrency control. Depending on the concurrency control method (i.e., if it uses strict – as opposed to relaxed – serializability), the effects of an incomplete transaction might not even be visible to another transaction.

Durability property ensures that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.

MongoDB : How to Scale MongoDB Datasets

As time goes and data grows it is required to scale any database to maintain the quick response time and performance. Initially we run query explain plan and make changes in the document structure to make them efficient and add/modify required indexes. However even after performing clean up and maintenance tasks, the time comes to scale and ensure availability, durability and fault tolerance.

scaleMongo

First, we scale vertically

MongoDB, like most databases, craves RAM and IO capacity. It sometimes likes CPU. The simplest conceptual way of scaling performance for a MongoDB dataset is to give it more system resources without worrying about spreading the load across servers. Normally, this is painful for cost or operational reasons. Doubling the capacity of a production MongoDB replica set means swapping larger servers in, figuring out what to do with the old ones, and hoping the new ones are just the right size to keep things healthy for a long period of time.

Then, we scale out

At this point, scaling out MongoDB is easy. A well-built, sharded MongoDB dataset is easy to reason about and will scale linearly across other servers. The sharding needs the key to divide the data across several small / commodity servers grouped in a cluster.

shardingExample

Flexibility is a requirement of an evolving data set

MongoDB offers numerous features that make developers lives easier. It also offers features for scale. Using the scaling features at the wrong time means compromising on developer-friendly features (unique constraints, oplog usefulness, capped collections). There is a great deal of pressure on developers to use the MongoDB sharding features even when they’re not necessary, which makes their lives worse in aggregate. The most healthy MongoDB setups started with developers using features that helped them move faster, and evolved as understanding of the problem scope and appropriate scale increased.

For developers that use MongoDB, they should make smart decisions and don’t force themselves down a path before they even have a map. We say inspect and adopt in Agile 😉

 

What is Json schema

Json is very flexible, especially when you expect a json payload you have to ensure that data is passed in correct data type. For instance

Json Payload 1 – amount as number

{
item : “premaseem”,
sold : true
amount : 786
}

Json Payload 2 – id as string

{
item : “premaseem”,
sold: “True”
amount : “786”
}

Both payloads are good however if you have to do some calculation on amount and if it is passed as string, math operation would fail. So you need to publish a schema by which user or consumer can understand what data type is expected to create a valid and useful payload which can be consumed without causing issues.

This website can be used to generate the Json schema

http://jsonschema.net

This website can be used to validate the json object against Json schema

http://json-schema-validator.herokuapp.com

What is JSON Schema ?

JSON Schema specifies a JSON-based format to define the structure of JSON data for validation, documentation, and interaction control. A JSON Schema provides a contract for the JSON data required by a given application, and how that data can be modified.

JSON Schema is based on the concepts from XML Schema (XSD), but is JSON-based. The JSON data schema can be used to validate JSON data. As in XSD, the same serialization/deserialization tools can be used both for the schema and data. The schema is self-describing.

Example JSON Schema (draft 4):

{
  "$schema": "http://json-schema.org/schema#",
  "title": "Product",
  "type": "object",
  "required": ["id", "name", "price"],
  "properties": {
    "id": {
      "type": "number",
      "description": "Product identifier"
    },
    "name": {
      "type": "string",
      "description": "Name of the product"
    },
    "price": {
      "type": "number",
      "minimum": 0
    },
    "tags": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "stock": {
      "type": "object",
      "properties": {
        "warehouse": {
          "type": "number"
        },
        "retail": {
          "type": "number"
        }
      }
    }
  }
}

The JSON Schema above can be used to test the validity of the JSON code below:

{
  "id": 1,
  "name": "Foo",
  "price": 123,
  "tags": [
    "Bar",
    "Eek"
  ],
  "stock": {
    "warehouse": 300,
    "retail": 20
  }
}