Perhaps surprisingly, Amazon Elasticsearch is hardly overwhelming, coming with a very basic tool kit and an outdated release version. And it’s expensive.
Just a month ago, AWS launched their Amazon Elasticsearch Service. Elasticsearch itself is an open source scalable, distributed, real-time search and analytics engine from Elastic, the creators of Logstash, Beats, and Kibana. Elasticsearch makes an excellent alternative to Splunk.
According to AWS Elasticsearch documentation:
“Amazon Elasticsearch Service (Amazon ES) is a managed service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS cloud…You can set up and configure your Amazon Elasticsearch cluster in minutes from the AWS Management Console. Amazon ES provisions all the resources for your cluster and launches it…Amazon ES allows you to easily scale your cluster via a single API call or a few clicks in the AWS Management Console.”
Amazon Elasticsearch features
According to their documentation, the Amazon Elasticsearch Service provides the following features:
- A full range of instance types from which to build your clusters.
- Magnetic, General Purpose, and Provisioned IOPS EBS volumes.
- Clusters spanning multiple regions and Availability Zones.
- Security through IAM-based access control.
- Dedicated master nodes to improve cluster stability.
- Domain snapshots to back up and restore Elasticsearch domains and replicate domains across Availability Zones.
- Kibana for data visualization.
- Integration with Amazon CloudWatch for monitoring Elasticsearch domain metrics.
- Integration with AWS CloudTrail for auditing configuration API calls to Elasticsearch domains.
Amazon Elasticsearch currently uses following package versions:
- Elasticsearch 1.5.2
- Kibana 4 (also Kibana 3 as a plugin).
- Plugins: jetty, cloud-aws, kuromoji, and icu.
- The following APIs:
/_alias, /_aliases, /_all, /_analyze, /_bulk, /_cat, /_cluster/health, /_cluster/settings, /_cluster/stats, /_count, /_flush, /_mapping, /_mget, /_msearch, /_nodes, /_plugin/kibana, /_plugin/kibana3, /_percolate, /_refresh, /_search, /_snapshot, /_stats, /status, /_template
Amazon Elasticsearch: limits
Amazon Elasticsearch has a few built-in limitations which you need to be aware of before you start:
Older version of Elasticsearch
Elasticsearch 1.5.2 – the version used by Amazon – is actually quite old when compared with the current stable version is (1.7.2). And Elasticsearch 2.0.0 beta, which is just around the corner, will address many more bugs. Since Amazon Elasticsearch is a managed service, there is no way for you to upgrade your clusters on your own. If you were to host Elasticsearch yourself, upgrades would be as simple as updating the jar files in ES_HOME/lib folder.
Elasticsearch 1.5.x and other versions have critical bugs
In the release notes to Elasticsearch 1.7.1, more than a dozen bugs are identified as fixed. Users are advised to upgrade their clusters as soon as possible. Here are just a couple of examples:
IP range aggregation issue:
ip_range aggregation with mask of 0.0.0.0/0 gets treated as 0.0.0.0/32. This was resolved with the 1.7.x release.
Elasticsearch 1.7.x has addressed many problems from 1.5.2, including one which could result in the loss of an entire index if you suffer a multiple node failure while having idle shards. This might be a particularly serious concern with a cloud setup, where node failures due to Availability Zone outages are not uncommon. Although these are rare cases, Elasticsearch Support did send this email alert to their customers:
EBS volume size
You can attach a maximum of 512 GB of storage to a single I or R series node (i2.2xlarge, r3.8xlarge etc). For M series nodes, however, you are limited to a maximum of 100 GBs. Besides the fact that I and R series nodes are expensive, they only come as large, instance-store volumes. This is an obvious problem if you intend to shut down, and then reuse your Elasticsearch cluster at some future time.
There are two major limitations with the instance types available for Amazon Elasticsearch. The first is that you can only run a maximum of ten instances per cluster. If you want more, you’ll have to submit a service request for an increase. The second problem concerns node memory. Here’s what Elasticsearch’s documentation says:
“A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing many, many small machines)”.
Seeing how AWS offers us r3.2xlarge instances (and higher) and i2.2xlarge, fits nicely with Elastic’s ideal for cluster nodes, but they will be very expensive. An EC2 r3.8xlarge on-demand RHEL instance costs $2.903 per hour, and the r3.8xlarge.elasticsearch will cost you $4.704 per hour!
No Shield, Watcher, and Marvel support
Elasticsearch has released many commercial products: Shield for security, Watcher for alerts and notifications, and Marvel for cluster monitoring. They are really useful and come out-of-box with Elasticsearch. There are many such plugins, like Sense, kopf, and river, that were developed for Elasticsearch administrators and developers. You can certainly use AWS’s IAM and Cloudwatch in place of Shield and Marvel, but choosing those will sometimes add extra costs and often new skills. If you already have Shield, Watcher, and Marvel licenses, and you’re just moving your existing Elasticsearch cluster to Amazon, then those licenses will be of no use.
No River Plugin support:
River plugins are helpful for supporting data migration from a source to an Elasticsearch cluster (like MongoDB River and jdbc River). Again, not all of those are available for Amazon Elasticsearch installations.
Perhaps surprisingly, Amazon Elasticsearch is hardly overwhelming. It certainly looks nice, but it comes with a very basic tool kit and, as we’ve seen, lacks access to some fairly critical features. In my opinion, Amazon Elasticsearch does deliver an agile offering with faster cluster set up and automated snapshot and restore process, but it is not yet cost-effective.
Setting up Elasticsearch on your own VM (including EC2 instances) is not at all difficult. You can decompress the zip or tar files and, with a minimum of administration knowledge, make the light modifications to the elasticsearch.yml file. You’ll have your cluster up and running in minutes. With your own setup, you have more control over your cluster. You can change the parameters and reconcile your cluster with releases from Elasticsearch.
However, this is Amazon, and this is just a 1.0 release. We can certainly expect to see something significantly more robust in the coming months.
Cloud Academy’s Blog Digest: July 2019
July has been a very exciting month for us at Cloud Academy. On July 10, we officially joined forces with QA, the UK’s largest B2B skills provider (read the announcement). Over the coming weeks, you will see additions from QA’s massive catalog of 500+ certification courses and 1500+ ins...
AWS Fundamentals: Understanding Compute, Storage, Database, Networking & Security
If you are just starting out on your journey toward mastering AWS cloud computing, then your first stop should be to understand the AWS fundamentals. This will enable you to get a solid foundation to then expand your knowledge across the entire AWS service catalog. It can be both d...
How to Become a DevOps Engineer
The DevOps Handbook introduces DevOps as a framework for improving the process for converting a business hypothesis into a technology-enabled service that delivers value to the customer. This process is called the value stream. Accelerate finds that applying DevOps principles of flow, f...
AWS Machine Learning Services
The speed at which machine learning (ML) is evolving within the cloud industry is exponentially growing, and public cloud providers such as AWS are releasing more and more services and feature updates to run in parallel with the trend and demand of this technology within organizations t...
AWS Control Tower & VPC Traffic Mirroring
AWS re:Inforce 2019 is a two-day conference for security, identity, and compliance learning and community building. This year's keynote, presented by AWS Vice President and CIO, Stephen Schmidt, announced the general availability of AWS Control Tower and the new VPC Traffic Mirroring fe...
Working with AWS Networking & Amazon VPC
Being able to architect your own isolated segment of AWS is a simple process using VPCs; understanding how to architect its related networking components and connectivity architecture is key to making it a powerful service. Many services within Amazon Web Services (AWS) require you t...
AWS Compute Fundamentals Update
AWS is renowned for the rate at which it reinvents, revolutionizes, and meets customer demands and expectations through its continuous cycle of feature and service updates. With hundreds of updates a month, it can be difficult to stay on top of all the changes made available. Here ...
10 Steps for an Effective Reserved Instances Strategy
Amazon Web Services (AWS) offers three different ways to pay for EC2 Instances: On-Demand, Reserved Instances, and Spot Instances. This article will focus on effective strategies for purchasing Reserved Instances. While most of the major cloud platforms offer pre-pay and reservation dis...
AWS Certification Practice Exam: What to Expect from Test Questions
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 11 certifications that cover major cloud roles including Solutions Architect, De...
AWS Certified Solutions Architect Associate: A Study Guide
The AWS Solutions Architect - Associate Certification (or Sol Arch Associate for short) offers some clear benefits: Increases marketability to employers Provides solid credentials in a growing industry (with projected growth of as much as 70 percent in five years) Market anal...
Moving Data to S3 with Apache NiFi
Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI). In this blog post, we will examine a simple way to move data to the cloud using NiFi c...
Amazon DynamoDB: 10 Things You Should Know
Amazon DynamoDB is a managed NoSQL service with strong consistency and predictable performance that shields users from the complexities of manual setup. Whether or not you've actually used a NoSQL data store yourself, it's probably a good idea to make sure you fully understand the key ...