OSS Concepts
Start course
1h 3m

This course explores the Alibaba Object Storage Service (OSS), covering the basics of the service and then looking at its features through guided demonstrations from the Alibaba Cloud Platform.

Learning Objectives

  • Understand basic OSS concepts.
  • Learn how to manage buckets and objects on OSS,
  • Understand how to carry out image processing
  • Learn how to carry out website hosting and monitoring on top of OSS
  • Learn about Alibaba custom domains and anti-leeching features
  • Learn about OSS's security model

Intended Audience

This course is intended for anyone who wants to learn more about Alibaba OSS, as well as anyone studying for the ACP Cloud Computing certification exam.


To get the most out of this course, you should have a basic understanding of the Alibaba Cloud platform.


We'll now discuss some core OSS concepts. So OSS is an important part of the Alibaba Cloud platform and just like the rest of the Alibaba Cloud public cloud platform it depends on Apsara in order to run. So Apsara is our cloud management system. The boxes in orange here in the architecture diagram on this slide are the components of the Apsara System. In particular, OSS depends heavily on Pangu, which is the distributed file system component of Apsara. The way that OSS is able to store data in triplicate is part of the Pangu architecture.

OSS is supported across all 22 Alibaba Cloud regions and all 66 of our zones worldwide. So wherever you are, you'll be able to find OSS within your closest Alibaba Cloud region. The two most important concepts to understand in OSS are the object and the bucket. So an object is essentially a file. This is the fundamental entity that you store inside your OSS buckets. So an object has three parts, the key, the data and the metadata. The key is essentially the object's name. It's like a file name. It has to be unique within a given bucket. The data is the file content for your object. And then the metadata is a set of key-value pairs that expresses the object's attributes. So this could be things like a checksum, the size of the object, the last time it was modified, et cetera.

The data portion is opaque to OSS and is not manipulated by the OSS service except to store it onto disk and retrieve it when you request it. The size of an object varies with the upload method. Multipart upload supports up to 48.8 terabytes for a single object. However, if you're using our traditional HTTP upload methods, then the object size is limited to five gigabytes, so do keep that in mind. If your objects are larger than five gigabytes, you need to be using the multipart upload tool. Next there is buckets. So buckets serve as the containers that hold the objects you store in OSS.

Every object must be kept in a bucket and each bucket name must be globally unique within the OSS service and cannot be changed. What that means is that if another user is already using the bucket name that you want to use, you will have to pick another name. Bucket names even between users cannot overlap. They really must be truly global unique. We'll talk about why that is later on. There is no limit on the number of objects in a bucket. An application, so a third-party application that you write or the Alibaba Cloud Command Line can talk to more than one bucket. So you can communicate via the OSS API with each of your buckets. Buckets are how we organize the OSS namespace at the highest level. And they're also how we do usage reporting and access control.

So each object within a bucket has a key. The key is an object's unique identifier. The combination of the bucket name and the key uniquely identifies any object within the entire OSS service. And if you look at the URL here on the slide, you'll see why that is. The bucket name plus the bucket endpoint, which is the API endpoint for the OSS service, plus the object name, these three components in orange, blue, and red respectively, form a URL. This is also the reason that bucket names must be unique. They are part of a resolvable DNS name that points to your bucket and its content. So they really genuinely have to be unique because they have to resolve via the worldwide DNS service to a particular location.

So if you combine the protocol HTTP or HTTPS, plus your bucket name, plus your bucket endpoint, which varies, each Alibaba Cloud region has its own bucket endpoint, plus the object name, if you take these four things and combine them, you get a URL that points at a particular object within a particular bucket, and that URL is unique globally. So the URL points to just your object and only your object. Object keys, they can be up to 1024 bytes in length. The encoding we use is UTF-8. So the object name can include characters from other languages like Japanese or Chinese. The object key has to be unique within a given bucket, but of course, if you have more than one bucket, each of those different buckets can contain objects with the same names.

Object keys can contain a path prefix because object storage service does not have a concept of a directory. What you can do instead is simulate having nested directories by adding path prefixes to your bucket. So you can see there's an example in the bottom of the slide here in green. This is an object that has had some directory path prefixes added to the beginning of the name. So the original name of the object was just jtables.js, and they've added 1, 2, 3, 4 levels of path prefix. So if you were to actually go into the OSS console and try to find this object, the OSS console would make it appear to you as though there was a directory called assets with a directory called js inside, inside there, there would be a jquery directory, inside there, there would be plugins and inside plugins you'd find jtables.js. But if you were actually to use the API or the Command Line to try to fetch this object, the object's name or key would just be assets/js/jquery. There's no actual hierarchy here. It's just a simulated directory hierarchy that's created using these path prefixes.

So this is what the data structure on OSS looks like. At the top level, there's buckets. Within buckets, you have objects and simulated directories, which you simulate using the path prefixes we just discussed. And of course, each object has its three components, its key, its data and its metadata. So data stored in OSS is stored in one of several different storage classes and these range from hot to cold. Here, I've arranged them from left to right on this table. The hot storage classes are designed for data that is frequently accessed, whereas the cold storage classes are designed for data that is infrequently accessed. So you can see there's a column at the middle of the table called retrieval time, which lists how long it typically takes to retrieve data.

So Archive, it can take up to a minute to retrieve data with Cold Archive, depending on whether or not you pay for expedited retrieval, it can take between an hour and 11 hours to retrieve data. However, the advantage of those colder storage types like Archive, Cold Archive or Infrequent Access is that the cost per gigabyte is lower. However, for Infrequent Access Archive and Cold Archive on the right side of my table, you do pay a retrieval fee for requesting data back from the service. So that's an incentive to not query the service more than you need to.

With Standard storage, which is our typical default storage class for OSS, there's no retrieval fee for requesting an object from the service. So this is really the best choice for frequently access data. You'll also notice that for Standard and for Infrequent Access, we've actually got two different types. One is called LRS and one is called ZRS. So the difference between LRS and ZRS has to do with how the triplicate copies, the three copies of your data are stored. So in Standard and IA LRS, so in Local Redundant Storage, as we call it, all of your data is stored within a single Alibaba Cloud zone.

So the three copies of your data are on separate physical servers attached to separate network switches, but those physical servers live within the same data center. That is already good enough to get you a data durability of 11 nines and 99.99% service availability. However, if that isn't high enough, you can choose Standard or Infrequent Access ZRS. ZRS is Zone Redundant Storage. In Zone Redundant Storage, we keep the copies of your data in different zones. This means that your data will still be available, even if a single zone within a region fails, that gets you 99.995% service availability and 12 nines data durability.

So if you really need the ultimate availability and durability for your data, then ZRS, Zone Redundant Storage is the right choice. Not all of the objects within the bucket need to belong to the same storage class. You can set what are called Lifecycle rules within a bucket that will determine what happened to OSS objects at various points in their lifecycle. So when should objects transition between storage classes? You can set a rule that takes older objects, maybe objects that have been in your bucket for 60 days and transitions them to IA Infrequent Access storage in order to save money.

The Lifecycle rules can also determine when objects should be deleted from the bucket and what patterns an object must match in order for a given rule to take effect. In order to achieve better redundancy and fault tolerance in addition to Zone Redundant Storage or ZRS, which we just discussed, there's also Versioning and Cross-Region Replication. Versioning is fairly simple. When you turn on versioning for a bucket, each time you upload a new copy of an object, instead of overriding the previous copy of that object, the previous copy is given a version number and hidden in the background, but it's kept in the bucket and you can retrieve or restore old versions of an object at any time by checking the version history. This way you can preserve changes that have occurred to objects over time.

Cross-Region Replication allows you to copy all of the content of a bucket to another bucket in a different region. So for instance, if I had an OSS bucket in Hong Kong and another OSS bucket in Singapore, I could set up CRR on the Hong Kong bucket, so that each time an object was uploaded to the Hong Kong bucket, it would also be copied over to the Singapore bucket automatically. This is a great way to achieve multi-region redundancy.

OSS also has some built-in features that are designed for web hosting scenarios. One of those is what we call back-to-origin mirroring. In this case, when a client, it could be a web browser or a Command Line tool or a third-party application, when a client requests access to an object in OSS, if OSS doesn't have a matching object, it will send a request back to some source website or source service. And if the object is found at that source, it will make a copy into OSS and then return that copy to the client. This can transparently achieve mirroring of data that's stored outside of OSS.

Similarly, OSS supports redirection. So if a client requests an object that's not stored in OSS, OSS can send back a redirect directive and the client can then go fetch the object from some other source. Again, this is a web hosting-related feature that can help you to deal with some common web hosting scenarios. OSS also has built-in logging and log management. You can turn on logging for a given bucket and then have those logs stored into another bucket, which we call the destination bucket under a particular path, which we call a log prefix. You can then retrieve those logs anytime you want to perform analysis or security checks or audits. You can see an example of such a log entry, just under the access logging section on the left side of this slide. You can see it includes the source IP address, the timestamp, the type of HTTP requests that it received, the flags associated with that request and the operating system of the requester.

If you want to have the ability to do real-time full text log search, you can also turn on the real-time logging feature. In addition to recording access logs, this actually indexes the logs and makes them searchable, so you can easily search through the logs using our real-time log GUI. And that's all for this section.

About the Author
Learning Paths

Alibaba Cloud, founded in 2009, is a global leader in cloud computing and artificial intelligence, providing services to thousands of enterprises, developers, and governments organizations in more than 200 countries and regions. Committed to the success of its customers, Alibaba Cloud provides reliable and secure cloud computing and data processing capabilities as a part of its online solutions.