Advantages of Hadoop:
Cost : Hadoop being open-sourced and using commodity hardware provides a cost effective model, unlike traditional RDBMS, which requires expensive hardware and high-end processors. The problem with traditional relational database management systems is that it is extremely costly if you want to scale in order to process massive volumes of data. In order to reduce costs, many companies try to down-sample data and classify data which may not give them correct picture of their business. The raw data would be archived or deleted, as it would be too costly to save it in database.
Scalability : Hadoop provides a highly scalable model in which the data can be processed by spreading the data over multiple inexpensive machines, which can be increased or decreased as per the enterprise requirement. Unlike traditional relational database systems (RDBMS) that can’t scale to process large amounts of data, Hadoop enables businesses to run applications on thousands of nodes involving many thousands of terabytes of data.
Flexibility : Hadoop is suitable for processing all types of data sets - structured, semi-structured and unstructured. This means all sort of data whether text, images, videos, clicks,etc. Can be processed by hadoop making it highly flexible.Hadoop enables businesses to easily access new and different types of data sources which can be both structured and unstructured. This means businesses can use Hadoop to derive valuable business insights from data sources such as social media, email conversations. Hadoop can be used for a wide variety of purposes, such as log processing, recommendation systems, data warehousing, market campaign analysis and fraud detection.
Speed : Hadoop follows a distributed file system model in which the file is broken into pieces and is processed parallel which provides a better performance and a relatively faster speed (as numerous tasks are running side by side) when compared to traditional DBMS. If you’re dealing with large volumes of unstructured data, Hadoop is able to efficiently process terabytes of data in just minutes, and petabytes in hours.
Fault-tolerance : Hadoop uses commodity hardware to store files, what enables this to compete high-end machines is the fault tolerance provided by Hadoop, in which the data is replicated on various machines and is read from one machine. If this machine goes down, the data can be read from the other machine, where the data is replicated.
Next Article: 3. Hadoop Ecosystem
Previous Article: 1. Introduction to Hadoop