Apache Hadoop

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users.

Harnessing the Power of Big Data Analysis on AWS

big data puzzle

Like a jigsaw puzzle, there are many components in the AWS big data ecosystem. Read this article and see how the components fit together to form a beautiful whole. If you are a data engineer, wouldn’t it […]

HDInsight – Azure’s Hadoop Big Data Service

azure cli

How can Azure HDInsight solve your big data challenges? Big data refers to large volumes of fast-moving data in any format that haven’t yet been handled by your traditional data processing system. In other words, […]

Google Cloud Certification: Preparation and prerequisites

Google Cloud Platform (GCP) has training and there are smart ways of preparing for the Google Cloud Certification Exams You might have read the recent news about Spotify building their new event delivery system on Google […]

Microsoft Azure Data Lake Store: an introduction

Azure Data Lake Analytics - introduction

The Azure Data Lake Store service provides a platform for organizations to park – and process and analyse – vast volumes of data in any format. Find out how. With increasing volumes of data to […]

AWS Elastic MapReduce: a hands-on guide to Big Data on Amazon

Elastic MapReduce

AWS Elastic MapReduce: a guided lab Amazon’s Elastic MapReduce (EMR) is a managed Hadoop framework that allows enterprise and academic users to quickly and easily process huge data sets. Use cases can include log analysis, […]

DynamoDB: An Inside Look Into NoSQL – Part 1

Amazon DynamoDB

This is a guest post from 47Line Technologies. In our earlier posts (here and here), we introduced the Hadoop ecosystem & explained its various components using a real world example of the retail industry. We […]