Elasticsearch vs CloudSearch: AWS Cloud Search Choices

Let’s compare AWS-based cloud tools Elasticsearch vs CloudSearch. While both use proven technologies, Elasticsearch is more popular (and open source), and CloudSearch is fully managed.

In part one of this series, we described what search engines are, how they solve the problem of accessing content stretched across large websites, and how Amazon CloudSearch provides a solution for a cloud environment. AWS CloudSearch is certainly a powerful and appealing service from Amazon. However, there are more popular players in the search engine market, and Elasticsearch ranks right behind Solr as the most popular search and analytics engine. We’ll explore the battle of the Amazon search providers: Elasticsearch vs CloudSearch.

Elasticsearch vs CloudSearch: provisioning

Both Elasticsearch and CloudSearch are provided by Amazon as AWS services. However, Elasticsearch is an independent product developed by elastic.co, which means you can set up Elasticsearch independently by downloading and extracting the tar ball, or through a yum/apt-get install.

Amazon CloudSearch, on the other hand, is fully managed by AWS, which, once you choose your instance type, handles the complete provisioning. Users are able to select High-Availability (AZ level), replication, and partitioning options through the AWS Management Console or AWS CLI.

Elasticsearch vs CloudSearch: Upgrading

Elasticsearch is easy to upgrade. The process can be as easy as replacing the lib folder of an older version with a new version.
Updates of Amazon CloudSearch are pushed by AWS, relieving users of the responsibility. However, this might result in delayed upgrades of new releases.

Elasticsearch vs CloudSearch: Data import/export

When existing data need to be searchable, they should be imported to the search engines. In Elasticsearch, there are plugins called “rivers” to push data into a cluster. There are many popular river plugins available such as elasticsearch–river- mongodb, elasticsearch-river-couchdb, Elasticsearch-jdbc. However, for various reasons, river plugins are being deprecated.

Logstash Forwarders, are normally used to push logs from application or database servers to Elasticsearch. This makes them available for searching logs or to plot graphs in Kibana. Recently, Logstash and input_changes plugins have taken center stage to replace rivers as tools to push data to Elasticsearch, too. Some of the recently developed input_changes are couchdb_changes, Twitter, and rabbitmq.

In Amazon CloudSearch, data and documents (in either XML or JSON format) are pushed in batches. Data can also be pushed to S3, with the data path given to index the documents.

Elasticsearch vs CloudSearch: Data and index backup

In Elasticsearch, data is backed up (and restored) using the Snapshot and Restore module. Usually, users are required to define a shared mount path. In the cloud, they can instead opt for Amazon S3, HDFS, or Azure storage. Curator is a tool that acts as a cron job manager that users can set to automate the backup process.

In Amazon CloudSearch, the service itself takes care of the whole backup process, once again sparing users the bother. Unlike Elasticsearch, where users must manually run the restore activity from backed up indexes, CloudSearch does it automatically.

Elasticsearch vs CloudSearch: Security and User Management   

Elasticsearch provides a plugin called shield to handle authentication and authorization. Shield also provides features like encryption, role-based access control, IP filtering, and auditing. However, shield is a licensed product that must be purchased.

You can also integrate your AD server to control access locally.

Amazon CloudSearch provides IAM-based access control.

Elasticsearch vs CloudSearch: Cluster management

In Elasticsearch, adding or deleting nodes within a cluster must be done manually. If the cluster instances are upgraded – i.e. vertical scaling – then you’ll need to run through the setup process from scratch. Old data must be backed up and restored to the new cluster. In the case of horizontal scaling, where servers are added or removed from the cluster, cluster rebalancing and resharding are mandatory. These, too, are manual processes. Users need to be very careful during the process.

Amazon CloudSearch, on the other hand, has built-in scaling and upgrade tools. When a server in a CloudSearch service reaches its threshold, it automatically upgrades to the next larger instance type. And when the capacity goes beyond the largest available instance types, the index is partitioned into multiple instances.

Elasticsearch vs CloudSearch: Monitoring

In Elasticsearch, there are cluster monitoring tools like Marvel which allow a user to send RESTful queries to check cluster health. Another product called Watcher provides an alerting mechanism. These tools are all provided by Elasticsearch itself. Users can, of course, also bring their own monitoring tools, like SPM or the New Relic plugin for Elasticsearch to keep an eye on their clusters.

Amazon CloudSearch is fully integrated with Amazon Cloudwatch, which can monitor metrics like SuccessfulRequests, Searchable Documents, Index Utilization, and Partition Count. Like Watcher in Elasticsearch, AWS Simple Notification Service (SNS) can be integrated with CloudSearch for alerting.

Elasticsearch vs CloudSearch: High Availability

As they’re both built for running search engines in the cloud, Elasticsearch and CloudSearch are designed for high availability.
Elasticsearch is built for distributed computing where the cluster grows horizontally. The indexes are split into shards and replication factors provide shard redundancy. Whenever a node fails, the replicated shards are used to replace lost data.

Elasticsearch employs a technique called zen discovery, where all the nodes communicate with each other through an “elected” master. In case the master node fails, another node takes over as master.

A similar architecture is followed in CloudSearch to handle failure and provide HA. CloudSearch also has an optional feature for multi-AZ replication within a single region to provide HA and Availability Zone failover.

Elasticsearch vs CloudSearch: Search and Indexing

In Elasticsearch, searching happens on both index and types using a search API.  The search API also includes Faceting and Filtering for searching data.

In CloudSearch, users create a search domain which includes sub-services to upload documents. A search service provides the means to search indexed data.

In Elasticsearch, many built-in libraries are provided for analyzers, tokenizers, and filters for indexing.
Amazon CloudSearch, on the other hand, provides a much simpler configuration service for all indexing operations and relevance ranking.

Elasticsearch vs CloudSearch: Client Libraries

There are many clients available for Elasticsearch. Official clients are Java API, .NET, Ruby, Groovy, PHP, PERL, Python, and JavaScript. Elasticsearch also supports RESTful APIs.

Amazon CloudSearch supports many SDKs along with RESTful API calls. The most popular SDKs are in Java, Ruby, Python, .Net, PHP, and Node.js.

Elasticsearch vs CloudSearch: Cost

As Elasticsearch requires manual set up, the true cost of deployment must include infrastructure costs, licensing for all non-open source software tools and the OS, and the Elasticsearch binary. This may require a large operational expenditure to cover skilled Elasticsearch admins and a monitoring team.

Amazon CloudSearch is priced according to the search instance size. Here’s an example:
Elasticsearch vs CloudSearch - instance size
With Multi-AZ enabled, the cost of redundant search instances will also be added. If an index is partitioned, the cost of each new search instance in each AZ is also added to the cost.

CloudSearch document batch uploads

Document batch upload costs are $0.10 per 1,000 Batch Upload Requests (the maximum size for each batch is 5 MB).

CloudSearch IndexDocuments requests

Re-indexing is required for indexes when a new field is added to the index. The charge for a re-indexing request is $0.98 per GB of data stored in your search domain.

CloudSearch data transfer

Inbound data transfers are free between Amazon CloudSearch and other AWS Services. There are charges for outbound data transfers:

  • Data transferred between Amazon CloudSearch and AWS services in different regions will be charged as Internet Data Transfers on both sides of the transfer.
  • Traffic sent between Amazon CloudSearch and Amazon EC2 instances in the same region is only billed for the Data Transfer in and out of the Amazon EC2 instances. Standard Amazon EC2 Regional Data Transfer charges apply.
Elasticsearch vs CloudSearch - cost

Elasticsearch vs CloudSearch: Conclusion

Both Elasticsearch and Amazon CloudSearch are built on proven technologies and are the choice of many demanding organizations. Because of its flexibility and active developer community, Elasticsearch is more popular. But Amazon CloudSearch scores when it comes to operational efficiency.

Because of its popularity, AWS provides Elasticsearch as a Service (Amazon Elasticsearch Service) which, in many ways, provides the best of both worlds. Elastic.co also provides Elasticsearch as a cloud service Found.

What do you think? Was this helpful for determining the finer points of each service? Comments welcome and appreciated.

Avatar

Written by

Chandan Patra

Cloud Computing and Big Data professional with 10 years of experience in pre-sales, architecture, design, build and troubleshooting with best engineering practices. Specialities: Cloud Computing - AWS, DevOps(Chef), Hadoop Ecosystem, Storm & Kafka, ELK Stack, NoSQL, Java, Spring, Hibernate, Web Service

Related Posts

Avatar
Michael Sheehy
— August 19, 2019

What Exactly Is a Cloud Architect and How Do You Become One?

One of the buzzwords surrounding the cloud that I'm sure you've heard is "Cloud Architect." In this article, I will outline my understanding of what a cloud architect does and I'll analyze the skills and certifications necessary to become one. I will also list some of the types of jobs ...

Read more
  • AWS
  • Cloud Computing
Avatar
Andrew Larkin
— August 13, 2019

Content Roadmap: AZ-500, ITIL 4, MS-100, Google Cloud Associate Engineer, and More

Last month, Cloud Academy joined forces with QA, the UK’s largest B2B skills provider, and it put us in an excellent position to solve a massive skills gap problem. As a result of this collaboration, you will see our training library grow with additions from QA’s massive catalog of 500+...

Read more
  • AWS
  • Azure
  • content roadmap
  • Google Cloud Platform
Avatar
Adam Hawkins
— August 9, 2019

DevSecOps: How to Secure DevOps Environments

Security has been a friction point when discussing DevOps. This stems from the assumption that DevOps teams move too fast to handle security concerns. This makes sense if Information Security (InfoSec) is separate from the DevOps value stream, or if development velocity exceeds the band...

Read more
  • AWS
  • cloud security
  • DevOps
  • DevSecOps
  • Security
Avatar
Stefano Giacone
— August 8, 2019

Test Your Cloud Knowledge on AWS, Azure, or Google Cloud Platform

Cloud skills are in demand | In today's digital era, employers are constantly seeking skilled professionals with working knowledge of AWS, Azure, and Google Cloud Platform. According to the 2019 Trends in Cloud Transformation report by 451 Research: Business and IT transformations re...

Read more
  • AWS
  • Cloud skills
  • Google Cloud
  • Microsoft Azure
Avatar
Andrew Larkin
— August 7, 2019

Disadvantages of Cloud Computing

If you want to deliver digital services of any kind, you’ll need to estimate all types of resources, not the least of which are CPU, memory, storage, and network connectivity. Which resources you choose for your delivery —  cloud-based or local — is up to you. But you’ll definitely want...

Read more
  • AWS
  • Azure
  • Cloud Computing
  • Google Cloud Platform
Joe Nemer
Joe Nemer
— August 6, 2019

Google Cloud vs AWS: A Comparison (or can they be compared?)

The "Google Cloud vs AWS" argument used to be a common discussion among our members, but is this still really a thing? You may already know that there are three major players in the public cloud platforms arena: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)...

Read more
  • AWS
  • Google Cloud Platform
  • Kubernetes
Avatar
Stuart Scott
— July 29, 2019

Deployment Orchestration with AWS Elastic Beanstalk

If you're responsible for the development and deployment of web applications within your AWS environment for your organization, then it's likely you've heard of AWS Elastic Beanstalk. If you are new to this service, or simply need to know a bit more about the service and the benefits th...

Read more
  • AWS
  • elastic beanstalk
Avatar
Stuart Scott
— July 26, 2019

How to Use & Install the AWS CLI

What is the AWS CLI? | The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services and implement a level of automation. If you’ve been using AWS for some time and feel...

Read more
  • AWS
  • AWS CLI
  • Command line interface
Alisha Reyes
Alisha Reyes
— July 22, 2019

Cloud Academy’s Blog Digest: July 2019

July has been a very exciting month for us at Cloud Academy. On July 10, we officially joined forces with QA, the UK’s largest B2B skills provider (read the announcement). Over the coming weeks, you will see additions from QA’s massive catalog of 500+ certification courses and 1500+ ins...

Read more
  • AWS
  • Azure
  • Cloud Academy
  • Cybersecurity
  • DevOps
  • Kubernetes
Avatar
Stuart Scott
— July 18, 2019

AWS Fundamentals: Understanding Compute, Storage, Database, Networking & Security

If you are just starting out on your journey toward mastering AWS cloud computing, then your first stop should be to understand the AWS fundamentals. This will enable you to get a solid foundation to then expand your knowledge across the entire AWS service catalog.   It can be both d...

Read more
  • AWS
  • Compute
  • Database
  • fundamentals
  • networking
  • Security
  • Storage
Avatar
Adam Hawkins
— July 17, 2019

How to Become a DevOps Engineer

The DevOps Handbook introduces DevOps as a framework for improving the process for converting a business hypothesis into a technology-enabled service that delivers value to the customer. This process is called the value stream. Accelerate finds that applying DevOps principles of flow, f...

Read more
  • AWS
  • AWS Certifications
  • DevOps
  • DevOps Foundation Certification
  • Engineer
  • Kubernetes
Avatar
Vineet Badola
— July 15, 2019

AWS AMI Virtualization Types: HVM vs PV (Paravirtual VS Hardware VM)

Amazon Machine Images (AWS AMI) offers two types of virtualization: Paravirtual (PV) and Hardware Virtual Machine (HVM). Each solution offers its own advantages. When we’re using AWS, it’s easy for someone — almost without thinking —  to choose which AMI flavor seems best when spinning...

Read more
  • AWS
  • Hardware Virtual Machine
  • Paravirtual
  • Virtualization