Amazon CloudSearch offers an industrial strength search engine for your website that’s fast, reliable, and fully integrated with other AWS services.
As different as large web sites can be from each other, there is one thing you will find just about everywhere: the search box. Without search, many of the resources you’ve worked so hard to create will be virtually inaccessible. In other words, your search box – and the search engine that powers it – is among your web site’s most important tools…which makes selecting a search engine a most critical design consideration. Amazon CloudSearch is definitely an option to consider.
There are some very popular search engines and frameworks available but the most popular, and probably the best recognized, is Apache Lucene. Lucene, an open source library for information retrieval (IR), powers most websites and also enables some other well-known technologies like Solr, Elasticsearch, and Hibernate Search. Amazon CloudSearch incorporates Lucene, and also uses Solr as its underlying search engine.
Amazon CloudSearch is a fully-managed service in the AWS Cloud. It’s simple to set up, manage, and scale, and is a cost-effective search solution for a website or application. Like all managed AWS services, CloudSearch offers easy configuration and auto scaling for data and traffic, self-healing clusters and, when with Multi-AZ is enabled, high availability.
Amazon CloudSearch aims to provide high throughput and low latency in cloud environments. It also supports a rich set of features such as free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete, and support for 34 languages. It can be configured for user-provided scaling and availability options and can index documents in various formats and return results in JSON or XML format.
It’s no surprise that Amazon’s own products and sites are powered by CloudSearch for its reliability and performance.
Before going for a deep dive into CloudSearch, we should review some IR terminology:
An IR library or search engine pre-processes documents and texts to make them searchable at a very fast pace. Instead of searching through the whole data store in response to each new request, information is retrieved from an index where the search terms are stored. Indexes lie at the heart of any search engine.
Documents are indexed – stored and made searchable. In search engine terms, a document is the basic unit of information and contains data describing something. A document about an employee, for example, might contain information like the employee ID, name, department, job role, manager information, and city. A document about a book could contain the title, author, year of publication, and number of pages.
Documents are composed of fields, which contain more specific pieces of information, like employee ID or address. When you add a document, the information in the document’s fields is added to an index. When you make a query, the index is queried and the matching documents returned.
A domain has one or more search instances – each with resources such as RAM, CPU allocated to it – and storage for indexing data and processing requests. The number of search instances depends upon the volume and complexity of the documents you want to process.
Amazon CloudSearch features
Let’s explore some key CloudSearch features.
A managed AWS service takes care of all the intricacies of low level provisioning, error handling, fault tolerance, monitoring, and various management activities. Amazon CloudSearch does all that, leaving you with nothing more to do than create and configure a search domain and upload the data you want indexed. Once that’s done, you and your users will be free to search the data from your website.
But that’s not the whole story. As a managed service, CloudSearch scales up or down automatically according to the amount of data or index size. If your configured search instance becomes inadequate, CloudSearch automatically upgrades itself to the next larger instance type. And when the capacity goes beyond the largest available instance type, the index is partitioned to multiple instances. Moreover you don’t need to worry about indexing, query parsing, query processing, and results handling: that’s all taken care by CloudSearch.
Integrated with AWS Services
Documents for CloudSearch can be inexpensively stored on S3. CloudSearch also works with records in RDS or DynamoDB databases, is integrated with Amazon CloudWatch and supports index field statistics, and is fully integrated with IAM.
Amazon CloudSearch publishes the following four metrics into Amazon CloudWatch:
- SuccessfulRequests. Number of search requests successfully processed by the search instance.
- SearchableDocuments. Number of documents available in the search index.
- IndexUtilization. Index storage utilization rate of the search instance.
- Partitions. Number of partitions available in the search index.
CloudSearch is highly available and, if Multi-AZ is configured, a search domain will span two Availability Zones in same region. Updates are automatically pushed to the instances in both AZs.
Integration with IAM means fine-grained control over the creation and deletion of domains, indexing and re-indexing, and user access. Users can use both HTTP and HTTPS connections to send and search data.
Users can interact with CloudSearch through any one of three different services:
- Configuration service, to create and configure search domains.
- Document service, to upload documents.
- Search service, to submit search requests.
CloudSearch supported Instance Types:
Currently CloudSearch supports these five AWS EC2 instance types: search.m1.small, search.m3.medium, search.m3.large, search.m3.xlarge, search.m3.2xlarge. If your needs outgrow the capacity of a single search.m3.2xlarge instance, CloudSearch will automatically partition your service across multiple search instances. A search index can be split across as many as ten partitions.
Amazon CloudSearch pricing
Amazon CloudSearch pricing for the various configurations they offer isn’t all that complicated:
The cost of single search instances will vary by region.
When Multi-AZ is enabled, the cost for redundant search instances is also added. When partitioning occurs, the cost of each new search instance in each AZ added to the cost.
Document batch uploads
$0.10 per 1,000 Batch Upload Requests (the maximum size for each batch is 5 MB).
Index Documents requests
Re-indexing is required for indexes when a new field is added. The charge for a re-indexing request is $0.98 per GB of data stored in your search domain.
Data transfer in is free between Amazon CloudSearch and other AWS Services. Here’s the cost for data transfer out:
Data transferred between Amazon CloudSearch and AWS services in different regions will be charged as Internet Data Transfers at both ends.
For traffic sent between Amazon CloudSearch and Amazon EC2 instances in the same region, you are only charged for the Data Transfer in and out of the Amazon EC2 instances. Standard Amazon EC2 Regional Data Transfer charges apply.
With Amazon CloudSearch, users can add low-cost search capabilities to their website without bothering with provisioning, managing and handling indexing, data partitioning, and monitoring. In a coming blog post, we will provide hands-on exercises to illustrate how to work with CloudSearch for real-world applications. Cloud Academy offers quizzes on Amazon CloudSearch that you can take in study mode or test mode depending on your goals. The quizzes are a terrific tool both for assessing your knowledge level and for raising it. The feedback from users is that our quizzes work. Cloud Academy has a 7-day free trial so you can try out our courses, labs, quizzes and learning paths.
Got something to say? Add a comment below.
New on Cloud Academy: Red Hat, Agile, OWASP Labs, Amazon SageMaker Lab, Linux Command Line Lab, SQL, Git Labs, Scrum Master, Azure Architects Lab, and Much More
Happy New Year! We hope you're ready to kick your training in overdrive in 2020 because we have a ton of new content for you. Not only do we have a bunch of new courses, hands-on labs, and lab challenges on AWS, Azure, and Google Cloud, but we also have three new courses on Red Hat, th...
Cloud Academy’s Blog Digest: Azure Best Practices, 6 Reasons You Should Get AWS Certified, Google Cloud Certification Prep, and more
Happy Holidays from Cloud Academy We hope you have a wonderful holiday season filled with family, friends, and plenty of food. Here at Cloud Academy, we are thankful for our amazing customer like you. Since this time of year can be stressful, we’re sharing a few of our latest article...
Google Cloud Platform Certification: Preparation and Prerequisites
Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2019, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the second consecuti...
New Lab Challenges: Push Your Skills to the Next Level
Build hands-on experience using real accounts on AWS, Azure, Google Cloud Platform, and more Meaningful cloud skills require more than book knowledge. Hands-on experience is required to translate knowledge into real-world results. We see this time and time again in studies about how pe...
New on Cloud Academy: AWS Solution Architect Lab Challenge, Azure Hands-on Labs, Foundation Certificate in Cyber Security, and Much More
Now that Thanksgiving is over and the craziness of Black Friday has died down, it's now time for the busiest season of the year. Whether you're a last-minute shopper or you already have your shopping done, the holidays bring so much more excitement than any other time of year. Since our...
Understanding Enterprise Cloud Migration
What is enterprise cloud migration? Cloud migration is about moving your data, applications, and even infrastructure from your on-premises computers or infrastructure to a virtual pool of on-demand, shared resources that offer compute, storage, and network services at scale. Why d...
6 Reasons Why You Should Get an AWS Certification This Year
In the past decade, the rise of cloud computing has been undeniable. Businesses of all sizes are moving their infrastructure and applications to the cloud. This is partly because the cloud allows businesses and their employees to access important information from just about anywhere. ...
AWS Regions and Availability Zones: The Simplest Explanation You Will Ever Find Around
The basics of AWS Regions and Availability Zones We’re going to treat this article as a sort of AWS 101 — it’ll be a quick primer on AWS Regions and Availability Zones that will be useful for understanding the basics of how AWS infrastructure is organized. We’ll define each section,...
Application Load Balancer vs. Classic Load Balancer
What is an Elastic Load Balancer? This post covers basics of what an Elastic Load Balancer is, and two of its examples: Application Load Balancers and Classic Load Balancers. For additional information — including a comparison that explains Network Load Balancers — check out our post o...
Advantages and Disadvantages of Microservices Architecture
What are microservices? Let's start our discussion by setting a foundation of what microservices are. Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs). ...
Kubernetes Services: AWS vs. Azure vs. Google Cloud
Kubernetes is a popular open-source container orchestration platform that allows us to deploy and manage multi-container applications at scale. Businesses are rapidly adopting this revolutionary technology to modernize their applications. Cloud service providers — such as Amazon Web Ser...
AWS Internet of Things (IoT): The 3 Services You Need to Know
The Internet of Things (IoT) embeds technology into any physical thing to enable never-before-seen levels of connectivity. IoT is revolutionizing industries and creating many new market opportunities. Cloud services play an important role in enabling deployment of IoT solutions that min...