Perhaps surprisingly, Amazon Elasticsearch is hardly overwhelming, coming with a very basic tool kit and an outdated release version. And it’s expensive.
Just a month ago, AWS launched their Amazon Elasticsearch Service. Elasticsearch itself is an open source scalable, distributed, real-time search and analytics engine from Elastic, the creators of Logstash, Beats, and Kibana. Elasticsearch makes an excellent alternative to Splunk.
According to AWS Elasticsearch documentation:
“Amazon Elasticsearch Service (Amazon ES) is a managed service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS cloud…You can set up and configure your Amazon Elasticsearch cluster in minutes from the AWS Management Console. Amazon ES provisions all the resources for your cluster and launches it…Amazon ES allows you to easily scale your cluster via a single API call or a few clicks in the AWS Management Console.”
Amazon Elasticsearch features
According to their documentation, the Amazon Elasticsearch Service provides the following features:
- A full range of instance types from which to build your clusters.
- Magnetic, General Purpose, and Provisioned IOPS EBS volumes.
- Clusters spanning multiple regions and Availability Zones.
- Security through IAM-based access control.
- Dedicated master nodes to improve cluster stability.
- Domain snapshots to back up and restore Elasticsearch domains and replicate domains across Availability Zones.
- Kibana for data visualization.
- Integration with Amazon CloudWatch for monitoring Elasticsearch domain metrics.
- Integration with AWS CloudTrail for auditing configuration API calls to Elasticsearch domains.
Amazon Elasticsearch currently uses following package versions:
- Elasticsearch 1.5.2
- Kibana 4 (also Kibana 3 as a plugin).
- Plugins: jetty, cloud-aws, kuromoji, and icu.
- The following APIs:
/_alias, /_aliases, /_all, /_analyze, /_bulk, /_cat, /_cluster/health, /_cluster/settings, /_cluster/stats, /_count, /_flush, /_mapping, /_mget, /_msearch, /_nodes, /_plugin/kibana, /_plugin/kibana3, /_percolate, /_refresh, /_search, /_snapshot, /_stats, /status, /_template
Amazon Elasticsearch: limits
Amazon Elasticsearch has a few built-in limitations which you need to be aware of before you start:
Older version of Elasticsearch
Elasticsearch 1.5.2 – the version used by Amazon – is actually quite old when compared with the current stable version is (1.7.2). And Elasticsearch 2.0.0 beta, which is just around the corner, will address many more bugs. Since Amazon Elasticsearch is a managed service, there is no way for you to upgrade your clusters on your own. If you were to host Elasticsearch yourself, upgrades would be as simple as updating the jar files in ES_HOME/lib folder.
Elasticsearch 1.5.x and other versions have critical bugs
In the release notes to Elasticsearch 1.7.1, more than a dozen bugs are identified as fixed. Users are advised to upgrade their clusters as soon as possible. Here are just a couple of examples:
IP range aggregation issue:
ip_range aggregation with mask of 0.0.0.0/0 gets treated as 0.0.0.0/32. This was resolved with the 1.7.x release.
Elasticsearch 1.7.x has addressed many problems from 1.5.2, including one which could result in the loss of an entire index if you suffer a multiple node failure while having idle shards. This might be a particularly serious concern with a cloud setup, where node failures due to Availability Zone outages are not uncommon. Although these are rare cases, Elasticsearch Support did send this email alert to their customers:
EBS volume size
You can attach a maximum of 512 GB of storage to a single I or R series node (i2.2xlarge, r3.8xlarge etc). For M series nodes, however, you are limited to a maximum of 100 GBs. Besides the fact that I and R series nodes are expensive, they only come as large, instance-store volumes. This is an obvious problem if you intend to shut down, and then reuse your Elasticsearch cluster at some future time.
There are two major limitations with the instance types available for Amazon Elasticsearch. The first is that you can only run a maximum of ten instances per cluster. If you want more, you’ll have to submit a service request for an increase. The second problem concerns node memory. Here’s what Elasticsearch’s documentation says:
“A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing many, many small machines)”.
Seeing how AWS offers us r3.2xlarge instances (and higher) and i2.2xlarge, fits nicely with Elastic’s ideal for cluster nodes, but they will be very expensive. An EC2 r3.8xlarge on-demand RHEL instance costs $2.903 per hour, and the r3.8xlarge.elasticsearch will cost you $4.704 per hour!
No Shield, Watcher, and Marvel support
Elasticsearch has released many commercial products: Shield for security, Watcher for alerts and notifications, and Marvel for cluster monitoring. They are really useful and come out-of-box with Elasticsearch. There are many such plugins, like Sense, kopf, and river, that were developed for Elasticsearch administrators and developers. You can certainly use AWS’s IAM and Cloudwatch in place of Shield and Marvel, but choosing those will sometimes add extra costs and often new skills. If you already have Shield, Watcher, and Marvel licenses, and you’re just moving your existing Elasticsearch cluster to Amazon, then those licenses will be of no use.
No River Plugin support:
River plugins are helpful for supporting data migration from a source to an Elasticsearch cluster (like MongoDB River and jdbc River). Again, not all of those are available for Amazon Elasticsearch installations.
Perhaps surprisingly, Amazon Elasticsearch is hardly overwhelming. It certainly looks nice, but it comes with a very basic tool kit and, as we’ve seen, lacks access to some fairly critical features. In my opinion, Amazon Elasticsearch does deliver an agile offering with faster cluster set up and automated snapshot and restore process, but it is not yet cost-effective.
Setting up Elasticsearch on your own VM (including EC2 instances) is not at all difficult. You can decompress the zip or tar files and, with a minimum of administration knowledge, make the light modifications to the elasticsearch.yml file. You’ll have your cluster up and running in minutes. With your own setup, you have more control over your cluster. You can change the parameters and reconcile your cluster with releases from Elasticsearch.
However, this is Amazon, and this is just a 1.0 release. We can certainly expect to see something significantly more robust in the coming months.
Understanding AWS VPC Egress Filtering Methods
Security in AWS is governed by a shared responsibility model where both vendor and subscriber have various operational responsibilities. AWS assumes responsibility for the underlying infrastructure, hardware, virtualization layer, facilities, and staff while the subscriber organization ...
S3 FTP: Build a Reliable and Inexpensive FTP Server Using Amazon’s S3
Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have conf...
Microservices Architecture: Advantages and Drawbacks
Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...
What Are Best Practices for Tagging AWS Resources?
There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...
How to Optimize Amazon S3 Performance
Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...
How to Optimize Cloud Costs with Spot Instances: New on Cloud Academy
One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...
What are the Benefits of Machine Learning in the Cloud?
A Comparison of Machine Learning Services on AWS, Azure, and Google CloudArtificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to beli...
How to Use AWS CLI
The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services.So you’ve been using AWS for awhile and finally feel comfortable clicking your way through all the services....
AWS Summit Chicago: New AWS Features Announced
Thousands of cloud practitioners descended on Chicago’s McCormick Place West last week to hear the latest updates around Amazon Web Services (AWS). While a typical hot and humid summer made its presence known outside, attendees inside basked in the comfort of air conditioning to hone th...
From Monolith to Serverless – The Evolving Cloudscape of Compute
Containers can help fragment monoliths into logical, easier to use workloads. The AWS Summit New York was held on July 17 and Cloud Academy sponsored my trip to the event. As someone who covers enterprise cloud technologies and services, the recent Amazon Web Services event was an insig...
AWS Certification Practice Exam: What to Expect from Test Questions
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers nine certifications that cover the major cloud roles including Solutions Archite...
Disadvantages of Cloud Computing
If you want to deliver digital services of any kind, you’ll need to compute resources including CPU, memory, storage, and network connectivity. Which resources you choose for your delivery, cloud-based or local, is up to you. But you’ll definitely want to do your homework first.Cloud ...