Sunday, 27 December 2015

NoSQL - Types of NoSQL Database - part 2

In the previous article Introduction to NoSQL Database – Part 1, I have given an introduction to what is NoSQL database and How they evolve to fill that gaps and resolve issues which Relational databases were not able to handle. I would request you to visit the article Introduction to NoSQL Database – Part 1, before reading this article. In this article, I have given details about various flavors of NoSQL database available with examples. There is couple of Non Relational databases available today and they fall in following four categories:
  • Key-Value Stores Database
  • Document Oriented Database
  • Column Store Database
  • Graph Store Database
Key-Value Stores databases
Key-value DB stores data in the pair of Key and Value. They are pretty simplistic and most basic type of NoSQL DB but provide efficient and powerful model. In this model, data is stored in key value pairs where attribute is the KEY and content is the VALUE. You can think of them like the hash table in Java or other programming languages. It allows the Developers and Data Modelers to design a model which is schema less and provides flexibility. Data can only be queries and retrieved using the keys only. Keys in these databases are indexed. Unlike RDBMS, these databases offer most flexible data model where format of the every row can be different. Key Value stores provide high scalability which means that they can grow in size by multiple times without having to redesign them. They are extremely fast and are designed to handle massive loads and capable of processing huge amounts of data.

Key value store databases can be used in application which requires to process huge amount of low complexity data with scaling in mind. They can also be used for managing and storing user's session information, Storing user's shopping cart information. 

Example Key Value databases include Cassandra, Amazon DyanmoDB, Azure Table Storage (ATS), Riak, BerkeleyDB.

Strengths: Schema less model, extremely fast processing, manage massive data loads
Weakness:  Low consistency, no joins and aggregations.

For details on use cases, visit the excellent article which explains the use case of Key value store databases. “Big Data Architectures – NoSQL Use Cases for Key Value Databases

Document Oriented Database
Document store databases stores the data in the form of documents. It expands the basic idea of key-value stores where “documents” contain more complex grouping of key value pairs. Each document is assigned a unique key, which is used to retrieve the document. Data type of Key is always string and values can be stored as strings as well as Numeric, Boolean, Arrays and other nested Key value pairs. 

Documents are stored in standard formats like JSON, XML etc. Documents inside a document-oriented database are similar to records in relational databases. However their schema less feature makes them more flexible than relational db.

In relational database, each record in a table has same columns and columns which do not have data for a record will be empty/null. However in Document store database, each record can have different schema and there is no need to store columns/keys/attributes which are blank.

Example of document having grouping of Key Value pair is given below:

    "EMP_ID": "101",
    "EMP_NAME": "John Brown",
    "SKILLS": [ "JAVA","ORACLE" ]
    "EMP_ID": "102",
    "EMP_NAME": "Richard Castle",
    "SKILLS": [ "TALEND" ]

These databases are primarily designed for storing, retrieving, and managing document-oriented information. E.g. 
  • Blog application wit comments functionality, 
  • Storing massive amount of semi structured or unstructured logs. 
  • Storing Machine/Sensor generated data.
  • Managing and Storing data for Surveys application.
  • Application which deals with diverse product portfolios.
Example Key Value databases include MongoDB, Couch DB.

Strengths: Flexible Schema, Fast key value access, Massive write performance, Scale able
Weakness: Less consistency, no Joins, Lack of complex queries

Based on graph theory, these databases are designed to explore relations or connections among different nodes. These databases use edges and nodes to represent and store the data. Nodes are represented as objects and Edges act as relations among these nodes. Every node and edge can store additional properties in form of Key-value pairs. It is not optimal to store the connected data in RDBMS as they struggle while traversing through all connected data. Graph Data model is efficient for traversing massive amount of data quickly and efficiently. Graph database provides schema less and efficient storage of semi structured data. Some of the Graph databases are ACID complaint and offers rollback support.     
Graph database can be useful for application where objects are linked interconnected. Best example of these applications is Facebook and other Social media apps. We can model Social networking websites using Graph Database where each account represents a node and edge is the relation between them. Let’s take an example of below sample graph data model. Let’s assume it represents a LinkedIn connection in the simplest form where each node represents a LinkedIn user and each edge represent a connection. We can say that users B,C and F are 1st connection of A and users D, E & G are 2nd connection to A. We can also say that user F is only connected to user E and G. This is just a simplistic representation of Graph database. Each Node and edge can have their own properties.

Graph database can be used in following applications:
  • Social Media and Networking applications
  • Network and Cloud Management
  • Security and Access Control
  • Recommendation engines
Example Graph databases include Neo4J, Titan.

Column Store Database
Instead of storing data in rows, these databases are designed for storing data tables as sections of columns of data, rather than as rows of data. All columns are treaded individually and values of single column are stored together. Having stored data in wide-column stores offer very high performance and a highly scalable architecture.  Column stored can greatly improve the performance of the queries that requires small number of columns as they will have to fetch data for those columns and combine them together to show the result set. Column store databases do not store data in tables but store data in massively distributed architecture. These were created to store and process data distributed over many machines. Keys in column store databases point to multiple columns. As the data is stored in distributed architecture, it can be aggregated quickly and efficiently.

Column store is basically based on Google Big table implementation. Let’s understand with example of simple Employee table:


Let’s compare, how data for the above stored in Relational and Column Store database?
Relational DBMS:  In most of relational database, above data set may be serialized and stored like below:


Every record will be assigned a internal Id (rowed) which is internally used by the system to refer data.

Column Store Database:  A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on. Data may be stored like below:


Some of the Column store databases may also store column data in column specific files. E.g. one data file per column.

Examples include: HBase, BigTable and HyperTable.

Strengths: high throughput for Big Data, Excellent, Highly Scale able, Strong partitioning support
Weakness: inability to perform complex queries, high latency of response to queries

This completes details of types of No SQL databases. Let me know if you have any queries or questions.

In the next article I will try to cover difference between Relational and NoSQL database and when we should prefer to use NoSQL over relational database?

No comments:

Post a Comment