Choosing the Right Instance Type
Start course

In this course, you'll learn what Elastic Compute Service (ECS) is, how to use it, and what makes Alibaba Cloud's new 6th generation instances faster and more powerful than their predecessors. You'll also learn how to choose the right type of instance for your workload, and how to save money by understanding the ECS pricing model. Finally, you'll learn how to launch your own ECS instance from Alibaba Cloud's web console.

Learning Objectives

  • Understand the fundamentals of Alibaba ECS
  • Learn the advantages of the sixth-generation ECS hardware
  • Learn the instance types and families in ECS
  • Learn how to choose the right instance type for your workload
  • Understand ECS's different pricing models
  • Learn how to purchase an ECS instance from the Alibaba Cloud console

Intended Audience

  • Solution architects
  • System operators
  • Developers
  • Anyone who wants to learn about Alibaba ECS


To get the most out of this course, you should have a basic understanding of the Alibaba Cloud platform.



Let's next talk about choosing the right instance type. We'll go into a little bit more detail about how to choose instance types here. So the G, C, and R type instances are general purpose instance types. So G, C, and R. Let me explain what those are. G is general purpose, C is compute, and R is memory. So general purpose instances are suitable for almost any workload. This is the default choice. If you aren't exactly sure what type you should use, C is the same as general purpose but with a higher vCPU to memory ratio and R is the same as general purpose but with a higher memory to vCPU ratio.

So R machines have more RAM, C machines have more CPU, and G provides a good balance between the two. We also offer a high end CPU offerings and very high capabilities on the network and storage side. So these instances are powerful. Despite being general purpose, they are very powerful, very capable. And if you choose G6e, C6e, or R6e types, these are the enhanced generation six instances, and they include the ESSD. They include enhanced SSD, as well as Intel's Cascade Lake CPUs with clock speeds up to 3.2 gigahertz. So these are suitable for a wide range of applications.

The most common workloads are things like Nginx, Docker, Kubernetes, Redis, Memcached, different types of middleware solutions, Java or Python applications, production databases, et cetera, et cetera. As I said, you can run almost anything on these. Then there are the high-frequency instances. These are the same as the G, C, and R type instances, except that the name starts with HF, which indicates that it is a high-frequency instance.

So if you're buying an instance and you see that the instance family includes HF, this indicates that the instance has a higher clock rate. It includes a faster CPU. These are a better fit if your workload is very CPU heavy because of the higher CPU performance. These instances have Copper Lake CPUs with the burst frequency up to 3.8 gigahertz, and also the F16 instruction set support for AI workloads. So these are also a good choice for non-GPU AI workloads.

Big data instances, this is the D2S and D2C class instances. So these are instances that include local hard drives, actually local solid state drives, I should say. So the advantage here is that these instances have a local disk that's physically attached to the instance. The traditional cloud disk block device that our ECS VMs use is actually network storage. It is very fast. It is very low latency, but it's not physically attached to the machine. It's on a network storage device somewhere nearby.

With the D2S and D2C instances, there's actually, cost-effective local HDDs right there ready for you to use. The D2S is storage optimized. This is for storage heavy workloads. D2C is compute optimized. So the D2S features Intel Skylake CPUs and up to 38 terabyte hard drives with up to 35 gigabit per second bandwidth between instances. So you can cluster these. They do have very fast network, and the instances provide two hot swappable, redundant, local hard drives, which you can swap in to replace failed disk, which can help to reduce downtime when a disk failure occurs.

D2C uses Intel Cascade Lake CPUs and can support up to 12 four-terabyte hard drives, again, with 35 gigabit per second network bandwidth between instances. One important point here, Cloud Disk storage, one of the major advantages of Cloud Disk, which we discussed when we talked about what is ECS, is that cloud disk stores redundant copies of data. Local disks don't do that. Local disks are just ordinary local hard drives. 

There's no storage management or redundancy management layer on top of that. These are just disks. So if they fail, you will lose data. So your application should have built in redundancy to cope with this. So if you're running something like Hadoop's HDFS, that's fine because HDFS is actually built to survive disk failure. So if you cluster D2S or D2C instances together, and you run HDFS across them, you'll be okay, because if a disk fails, HDFS will handle that for you. But if your application doesn't have that built in redundancy, you've been warned. If a disk fails, you will lose data.

So the big data workloads you would typically run on a D2S or D2C are things like Hive, Hbase, Spark, ElasticSearch, Map Reduce jobs, Jindo FS, HDFS. You could also use them for log and data processing applications like Kafka and Flume.

Okay, the local SSD instance or I3. There are two variants of the I series, the I3 and the I3G. The I3 focuses more on workloads that need more high performance local SSD, while I3G focuses more on computing. So the I3 includes, again, Intel Cascade Lake CPU's. These go up to 3.2 gigahertz all-core turbo. These machines have up to 768 gigabytes of RAM, and up to 12 x 1.8 terabyte high performance local SSDs attached to the instance. The network then with this up to 32 gigabits per second, again.

Compared with the older I2, the I3 supports NVMe protocol for these local SSDs, which significantly improves performance in terms of both latency and throughput for these local disks. So this is a good choice if you need really, really high local solid state drive performance. Caution, again, the local disks are not Cloud Disks, and they do not have built-in redundancy like our Cloud Disk service has. So data will be lost if the disks fail. You need to have application level redundancy, again, to deal with this.

So what sort of workloads would you run on on I3? An OLTP database system, like MySQL, Microsoft SQL, or PostgreSQL, maybe a NoSQL database like MongoDB, Cassandra, Hbase, or Redis. You could run ElasticSearch. Again, RabbitMQ or Kafka would run on this type of instance. And of course you could easily run Hadoop on this type of instance, just as you could with the D2S or D2C.

Then there are the burstable instances. These ones are interesting. Actually, these are what you would use if you have a testing workload or you're just getting started and you want to try to save some money. So these are perfect for use cases with low average CPU utilization. So if your instance spends 90% of its lifetime using less than 20% of the allocated CPU, these instances are great because they're much cheaper than equivalent general purpose instance types, sometimes 20 to 50% cheaper.

The T6 and T5. So T6 is sixth generation, T5 is fifth generation. They both provide a baseline CPU performance that's between 5 and 40% of the CPU with the ability to burst CPU usage up to 100%. So how does the bursting work? Well, there's a credit system that's used to allocate CPU time. So T6 and T5 instances will accumulate CPU credits. When a workload is below the baseline value of maybe 5 to 40% CPU use, each credit you earn provides the T6 or T5 instance with the opportunity to burst up to 100% of one CPU core for one minute, or 50% of one CPU core for two minutes.

So the CPU credits you have can be used to buy you essentially more CPU time. When you are below the baseline, you'll typically accumulate X number of credits per hour, with the number of credits you earn scale in with the instance size. So for instance, a T6 large has two vCPUs and 512 megabytes of RAM. If you use less than 5% of the CPU, then every hour you will accumulate six CPU credits, which means that if you need to burst up to a 100% CPU usage for six minutes, you can do that after the instance spends an hour idling, and you can earn up to 144 credits.

There is a maximum. You don't just keep earning credits forever. So once this instance has earned 144 credits, you've reached the maximum and you stop earning new credits. But if you were to look at a 2xlarge instance that has 8 vCPUs and 32 gigabytes of RAM, this instance can use up to 40% of the CPU and still continue to accumulate new credits. It earns 192 credits an hour and it can have up to 4,608 credits total. So you do get more ability to utilize the CPU bursting mode with the larger instance classes.

If you're really worried about your CPU usage being throttled, if you're worried you might run out of credits, but still need to complete some tasks or some workload, then we do have an unlimited mode, where after you use up your credits, the instance can continue to use 100% of the CPU, but we charge you. We charge you essentially for the additional credits you're burning. But if you're frequently using up your credits, you really should consider moving away from the T6 or T5 type and looking at the general purpose instances, because using unlimited mode is expensive.

In many cases, it will be more expensive than just using regular pay as you go general purpose instances. Then there's GPU instances. These are designed for all kinds of machine learning and artificial intelligence tasks, speech synthesis, natural language processing, recommendation and prediction, speech and face recognition, real-time rendering as well. You can do graphics and rendering on these machines. They do have GPU cards built in after all. 3D design, industrial design, scientific computing.

Essentially, the advantage here is that Alibaba GPU instances provide both high performance infrastructure and an AI acceleration layer for TensorFlow and other common frameworks. So we actually have an AI accelerator built into our GPU instances that actually helps to boost performance. We call it AIACC, and we've actually used this in competition. The AIACC layer, we used it on the DAWN Bench ImageNet test from Stanford, and we actually took the first place. We beat all of our competitors in terms of training time. So we were able to take first place in that competition because of this AI accelerator, and it works for TensorFlow, Caffe, PyTorch, and MXNet. We also support multiple storage options for our GPU instances.

So in addition to regular cloud disk block storage, we support NAS, OSS, and CPFS-Parallel file storage. Then there is the dedicated host, which we often refer to simply as DDH. This is a good fit for traditional finance, manufacturing, and commercial customers. Internet customers, like gaming companies, might also find this instance useful. It's a great way to start your migration to the public cloud, and the key advantages here are security and performance.

Because you're the only tenant on the physical machine, you can be guaranteed that no other users' virtual machines are going to use up your resources. Also, you get great security because again, you're not sharing this physical machine with any other Alibaba Cloud users. Key features include the ability to create custom instance classes. So you can create ECS VMs on dedicated hosts that have custom CPU to memory ratios that we don't normally support. You can also oversubscribe the CPU on purpose.

So you can actually customize CPU to virtual CPU ratios, which allows you to run more virtual machines on the hardware than you have CPU cores for. This is fine in most cases, because each VM won't be using 100% of its allocated CPU. Bare Metal, this is very similar to the dedicated host. Again, it is physical hardware that is allocated to you that other customers do not use. The key difference here is you can run ECS VMs on top of a dedicated host, but you can't do that on Bare Metal because Bare Metal has no virtualization layer. So Bare Metal instances are great for doing revirtualization or custom virtualization.

If you want to run HyperV, or VMWare, or KVM directly yourself, you can do that on X-Dragon BareMetal because the Bare Metal machine doesn't include its own virtualization layer, so you can add one. You can also use Bare Metal for workloads that acquired direct hardware access. Maybe you need access to Intel's VT-x or SGX instructions. You can use those on a Bare Metal instance. So the key advantages here are high performance, fast deployment of container workloads.

If you are running one of those, you can use a third-party hypervisor, and unlike a physical server in your own data center, ECS Bare Metal can be allocated and delivered in minutes, and can seamlessly connect with other Alibaba Cloud services. So even though it is a Bare Metal machine, you can put it into an Alibaba Cloud VPC network group. It can use all of the internal Alibaba Cloud endpoints for all of our services. It connects to them just as though it were an ECS VM. We also have a few unique instance types. These provide unique capabilities or hardware for special purpose workloads.

So the C6t and G6t, these have a trusted platform module, and we actually export that to the VM level. So you can do trusted boot inside of your VM if you're using a C6t or G6t. This would be important in the banking industry and finance, or if you have very strict security and compliance requirements.

There's a G6se. This is a storage enhanced generation six instance. These instances support up to 1 million IO operations per second at the instance level with the lowest block storage latency and throughput that we're able to support currently. So these are good for high-performance databases, search, self-built or custom file systems. And then there's G5ne and G7ne. These instances support up to 16 million concurrent sessions and 24 million packets per second network throughput with low latency. These are really designed for Gateways, self-built Load Balancers, Firewalls, and Routers.

And then there's Re6p. This is really cool. This is an instance with Intel Optane. If you don't know what Intel Optane is, I won't go into a lot of detail about it here. You should definitely go Google it. Intel Optane is super fast, solid state persistent storage. That's actually so fast, you can essentially use it as RAM. So this would be great if you're running Redis or SAP HANA, or a Parameter Server, or doing Log or Video Processing, basically anything where you need super high performance access to persistent storage. You can use it almost like RAM. Intel Optane is a very cool technology, and you should definitely go read about it if you're not familiar with it. And the Re6p instance supports Intel Optane.

About the Author
Learning Paths

Alibaba Cloud, founded in 2009, is a global leader in cloud computing and artificial intelligence, providing services to thousands of enterprises, developers, and governments organizations in more than 200 countries and regions. Committed to the success of its customers, Alibaba Cloud provides reliable and secure cloud computing and data processing capabilities as a part of its online solutions.