Amazon CloudSearch: Search-as-a-Service in AWS part 1

Amazon CloudSearch offers an industrial strength search engine for your website that’s fast, reliable, and fully integrated with other AWS services.

As different as large web sites can be from each other, there is one thing you will find just about everywhere: the search box. Without search, many of the resources you’ve worked so hard to create will be virtually inaccessible. In other words, your search box – and the search engine that powers it – is among your web site’s most important tools…which makes selecting a search engine a most critical design consideration. Amazon CloudSearch is definitely an option to consider.

There are some very popular search engines and frameworks available but the most popular, and probably the best recognized, is Apache Lucene. Lucene, an open source library for information retrieval (IR), powers most websites and also enables some other well-known technologies like Solr, Elasticsearch, and Hibernate Search. Amazon CloudSearch incorporates Lucene, and also uses Solr as its underlying search engine.

Amazon CloudSearch:

Amazon CloudSearch is a fully-managed service in the AWS Cloud. It’s simple to set up, manage, and scale, and is a cost-effective search solution for a website or application. Like all managed AWS services, CloudSearch offers easy configuration and auto scaling for data and traffic, self-healing clusters and, when with Multi-AZ is enabled, high availability.

Amazon CloudSearch aims to provide high throughput and low latency in cloud environments. It also supports a rich set of features such as free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete, and support for 34 languages. It can be configured for user-provided scaling and availability options and can index documents in various formats and return results in JSON or XML format.

It’s no surprise that Amazon’s own products and sites are powered by CloudSearch for its reliability and performance.

Before going for a deep dive into CloudSearch, we should review some IR terminology:

Indexing

An IR library or search engine pre-processes documents and texts to make them searchable at a very fast pace. Instead of searching through the whole data store in response to each new request, information is retrieved from an index where the search terms are stored. Indexes lie at the heart of any search engine.

Document

Documents are indexed – stored and made searchable. In search engine terms, a document is the basic unit of information and contains data describing something. A document about an employee, for example, might contain information like the employee ID, name, department, job role, manager information, and city. A document about a book could contain the title, author, year of publication, and number of pages.

Documents are composed of fields, which contain more specific pieces of information, like employee ID or address. When you add a document, the information in the document’s fields is added to an index. When you make a query, the index is queried and the matching documents returned.

Domain

A domain has one or more search instances – each with resources such as RAM, CPU allocated to it – and storage for indexing data and processing requests. The number of search instances depends upon the volume and complexity of the documents you want to process.

Amazon CloudSearch features

Let’s explore some key CloudSearch features.

Managed

A managed AWS service takes care of all the intricacies of low level provisioning, error handling, fault tolerance, monitoring, and various management activities. Amazon CloudSearch does all that, leaving you with nothing more to do than create and configure a search domain and upload the data you want indexed. Once that’s done, you and your users will be free to search the data from your website.

But that’s not the whole story. As a managed service, CloudSearch scales up or down automatically according to the amount of data or index size. If your configured search instance becomes inadequate, CloudSearch automatically upgrades itself to the next larger instance type. And when the capacity goes beyond the largest available instance type, the index is partitioned to multiple instances. Moreover you don’t need to worry about indexing, query parsing, query processing, and results handling: that’s all taken care by CloudSearch.

Integrated with AWS Services

Documents for CloudSearch can be inexpensively stored on S3. CloudSearch also works with records in RDS or DynamoDB databases, is integrated with Amazon CloudWatch and supports index field statistics, and is fully integrated with IAM.
Amazon CloudSearch publishes the following four metrics into Amazon CloudWatch:

  • SuccessfulRequests. Number of search requests successfully processed by the search instance.
  • SearchableDocuments. Number of documents available in the search index.
  • IndexUtilization. Index storage utilization rate of the search instance.
  • Partitions. Number of partitions available in the search index.

Highly Available:

CloudSearch is highly available and, if Multi-AZ is configured, a search domain will span two Availability Zones in same region. Updates are automatically pushed to the instances in both AZs.

Security Features:

Integration with IAM means fine-grained control over the creation and deletion of domains, indexing and re-indexing, and user access. Users can use both HTTP and HTTPS connections to send and search data.

CloudSearch Architecture

Users can interact with CloudSearch through any one of three different services:

  • Configuration service, to create and configure search domains.
  • Document service, to upload documents.
  • Search service, to submit search requests.

CloudSearch supported Instance Types:

Currently CloudSearch supports these five AWS EC2 instance types: search.m1.small, search.m3.medium, search.m3.large, search.m3.xlarge, search.m3.2xlarge. If your needs outgrow the capacity of a single search.m3.2xlarge instance, CloudSearch will automatically partition your service across multiple search instances. A search index can be split across as many as ten partitions.

Amazon CloudSearch pricing

Amazon CloudSearch pricing for the various configurations they offer isn’t all that complicated:

Search instances

The cost of single search instances will vary by region.

Amazon CloudSearch - types

When Multi-AZ is enabled, the cost for redundant search instances is also added. When partitioning occurs, the cost of each new search instance in each AZ added to the cost.

Document batch uploads

$0.10 per 1,000 Batch Upload Requests (the maximum size for each batch is 5 MB).

Index Documents requests

Re-indexing is required for indexes when a new field is added. The charge for a re-indexing request is $0.98 per GB of data stored in your search domain.

Data transfer

Data transfer in is free between Amazon CloudSearch and other AWS Services. Here’s the cost for data transfer out:

Amazon CloudSearch - costs

Data transferred between Amazon CloudSearch and AWS services in different regions will be charged as Internet Data Transfers at both ends.

For traffic sent between Amazon CloudSearch and Amazon EC2 instances in the same region, you are only charged for the Data Transfer in and out of the Amazon EC2 instances. Standard Amazon EC2 Regional Data Transfer charges apply.

Conclusion

With Amazon CloudSearch, users can add low-cost search capabilities to their website without bothering with provisioning, managing and handling indexing, data partitioning, and monitoring. In a coming blog post, we will provide hands-on exercises to illustrate how to work with CloudSearch for real-world applications. Cloud Academy offers quizzes on Amazon CloudSearch that you can take in study mode or test mode depending on your goals. The quizzes are a terrific tool both for assessing your knowledge level and for raising it. The feedback from users is that our quizzes work. Cloud Academy has a 7-day free trial so you can try out our courses, labs, quizzes and learning paths.

Got something to say? Add a comment below.

Avatar

Written by

Chandan Patra

Cloud Computing and Big Data professional with 10 years of experience in pre-sales, architecture, design, build and troubleshooting with best engineering practices. Specialities: Cloud Computing - AWS, DevOps(Chef), Hadoop Ecosystem, Storm & Kafka, ELK Stack, NoSQL, Java, Spring, Hibernate, Web Service

Related Posts

Avatar
Stuart Scott
— July 18, 2019

AWS Fundamentals: Understanding Compute, Storage, Database, Networking & Security

If you are just starting out on your journey toward mastering AWS cloud computing, then your first stop should be to understand the AWS fundamentals. This will enable you to get a solid foundation to then expand your knowledge across the entire AWS service catalog.   It can be both d...

Read more
  • AWS
  • Compute
  • Database
  • fundamentals
  • networking
  • Security
  • Storage
Avatar
Adam Hawkins
— July 17, 2019

How to Become a DevOps Engineer

The DevOps Handbook introduces DevOps as a framework for improving the process for converting a business hypothesis into a technology-enabled service that delivers value to the customer. This process is called the value stream. Accelerate finds that applying DevOps principles of flow, f...

Read more
  • AWS
  • AWS Certifications
  • DevOps
  • DevOps Foundation Certification
  • Engineer
  • Kubernetes
Avatar
Stuart Scott
— July 2, 2019

AWS Machine Learning Services

The speed at which machine learning (ML) is evolving within the cloud industry is exponentially growing, and public cloud providers such as AWS are releasing more and more services and feature updates to run in parallel with the trend and demand of this technology within organizations t...

Read more
  • Amazon Machine Learning
  • AWS
  • AWS re:Invent
  • Machine Learning
Avatar
Stuart Scott
— June 27, 2019

AWS Control Tower & VPC Traffic Mirroring

AWS re:Inforce 2019 is a two-day conference for security, identity, and compliance learning and community building. This year's keynote, presented by AWS Vice President and CIO, Stephen Schmidt, announced the general availability of AWS Control Tower and the new VPC Traffic Mirroring fe...

Read more
  • AWS
  • re:Inforce 2019
  • traffic mirroring
  • VPC
Avatar
Stuart Scott
— June 20, 2019

Working with AWS Networking & Amazon VPC

Being able to architect your own isolated segment of AWS is a simple process using VPCs; understanding how to architect its related networking components and connectivity architecture is key to making it a powerful service. Many services within Amazon Web Services (AWS) require you t...

Read more
  • AWS
  • VPC
Avatar
Stuart Scott
— June 19, 2019

AWS Compute Fundamentals Update

AWS is renowned for the rate at which it reinvents, revolutionizes, and meets customer demands and expectations through its continuous cycle of feature and service updates. With hundreds of updates a month, it can be difficult to stay on top of all the changes made available.   Here ...

Read more
  • AWS
Jeff Hyatt
Jeff Hyatt
— June 18, 2019

10 Steps for an Effective Reserved Instances Strategy

Amazon Web Services (AWS) offers three different ways to pay for EC2 Instances: On-Demand, Reserved Instances, and Spot Instances. This article will focus on effective strategies for purchasing Reserved Instances. While most of the major cloud platforms offer pre-pay and reservation dis...

Read more
  • AWS
  • EC2
Joe Nemer
Joe Nemer
— June 18, 2019

AWS Certification Practice Exam: What to Expect from Test Questions

If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 11 certifications that cover major cloud roles including Solutions Architect, De...

Read more
  • AWS
  • AWS Certifications
Avatar
John Chell
— June 13, 2019

AWS Certified Solutions Architect Associate: A Study Guide

The AWS Solutions Architect - Associate Certification (or Sol Arch Associate for short) offers some clear benefits: Increases marketability to employers Provides solid credentials in a growing industry (with projected growth of as much as 70 percent in five years) Market anal...

Read more
  • AWS
  • AWS Certifications
Chris Gambino and Joe Niemiec
Chris Gambino and Joe Niemiec
— June 11, 2019

Moving Data to S3 with Apache NiFi

Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI).  In this blog post, we will examine a simple way to move data to the cloud using NiFi c...

Read more
  • AWS
  • S3
Avatar
Chandan Patra
— June 11, 2019

Amazon DynamoDB: 10 Things You Should Know

Amazon DynamoDB is a managed NoSQL service with strong consistency and predictable performance that shields users from the complexities of manual setup. Whether or not you've actually used a NoSQL data store yourself, it's probably a good idea to make sure you fully understand the key ...

Read more
  • AWS
  • DynamoDB
Avatar
Andrew Larkin
— June 6, 2019

The 11 AWS Certifications: Which is Right for You and Your Team?

As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing. As the market leader and most ma...

Read more
  • AWS
  • AWS Certifications