Introduction to Apache Spark and Zeppelin on Google Cloud Dataproc

Want to learn more about using Apache Spark and Zeppelin on Dataproc via the Google Cloud Platform? You’ve come to the right place. Cloud Dataproc is Google’s answer to Amazon EMR (Elastic MapReduce). Like EMR, Cloud Dataproc provisions and manage Compute Engine-based Apache Hadoop and Spark data processing clusters. If you are not familiar with Amazon EMR,..

Harnessing the Power of Big Data Analysis on AWS

Like a jigsaw puzzle, there are many components in the AWS big data ecosystem. Read this article and see how the components fit together to form a beautiful whole. If you are a data engineer, wouldn’t it be great if you could easily scale your existing infrastructure on-demand to support your real-time data pipelines? If you are..

Google Cloud Certification: Preparation and prerequisites

Google Cloud Platform (GCP) has training and there are smart ways of preparing for the Google Cloud Certification Exams You might have read the recent news about Spotify building their new event delivery system on Google Cloud Platform (GCP). To scale with their huge volume of content, they have made numerous software architecture design changes to..

Big Data: Amazon EMR, Apache Spark and Apache Zeppelin – Part 2 of 2

In the first article about Amazon EMR, in our two-part series, we learned to install Apache Spark and Apache Zeppelin on Amazon EMR. We also learned ways of using different interactive shells for Scala, Python, and R, to program for Spark. Let’s continue with the final part of this series. We’ll learn to perform simple..

Big Data: Amazon EMR, Apache Spark, and Apache Zeppelin – Part 1 of 2

Amazon EMR (Elastic MapReduce) provides a platform to provision and manage Amazon EC2-based data processing clusters. Amazon EMR clusters are installed with different supported projects in the Apache Hadoop and Apache Spark ecosystems. You can either choose to install from a predefined list of software, or pick and choose the ones that make the most..

SELinux: improve the security of your EC2 servers

SELinux provides tools to more finely control the activities allowed to users, processes, and daemons to limit the potential damage from vulnerabilities. In the third and final part of our server security series, we will look at how we can enhance the security of Linux-based AWS EC2 instances with SELinux. We will learn how to..

Firewalld: improving security for your AWS EC2 server

While AWS EC2 instances should be well protected by VPC security tools, you may still need to implement protection at the OS-level, and that means firewalld. This is the second part of our server security series. In this article, we will look at configuring firewall rules via firewalld on Red Hat Enterprise Linux. While Amazon..

Server security: applying security updates to your EC2 instance

Enhance the server security of a Red Hat Enterprise Linux EC2 instance by monitoring and applying system updates. This is the first part of our Server Security on AWS series. In this series, we will explore some ways to enhance the security of a Red Hat Enterprise Linux EC2 instance. We may also touch on..

SystemTap: working with system monitoring scripts

This is the third and final part of our SystemTap series. This article assumes that you are familiar with SystemTap basics and that you have installed Docker on your AWS EC2 instance with a minimal Red Hat Enterprise Linux 7 platform container. Now we’ll explore working with actual SystemTap scripts to monitor processes and events…

SystemTap: Provisioning an AWS EC2-based Docker Instance

In the first article in our SystemTap series, we learned how to install the powerful diagnostic tool, SystemTap, on an AWS EC2 instance and then wrote our very first “Hello World” script. We now need to explore some of the interesting (and more useful) scripts that come with SystemTap. Building a SystemTap target environment To..