How Cloud Academy Is Using Cube to Win the Data Challenge

At Cloud Academy, we manage a lot of data every day. We have different sources we get data from such as feedback, events, and platform usage, and we need to get it, apply transformations, and finally present the data to our internal stakeholders and our customers.
Because of the variety of the data that we provide, we recently implemented Cube, a Headless BI solution. It allowed us to handle, model, and present data through our BI tools smoothly.
A Headless BI tool is a set of components that acts as middleware between your data warehouse and your business intelligence applications. It provides us with four main data-related components without the need of designing and implementing custom solutions. It allows us to work with data without hitting the data warehouse directly but leveraging the abstraction layer represented by the tool.
The name Headless comes because the tool allows us to work with the data, but it deliberately delegates the task to show and visualize it. This is the responsibility of the BI tool.
A Headless BI tool offers the following four components:
We have a lot of data coming from multiple sources internal and external to Cloud Academy. We work with structured, semi-structured, and unstructured data. So, before hitting the data from our BI tool, we need an approach to prepare and model data in an effective way.
Cube allows us to create final entities composed of dimensions (attributes) and measures (aggregations of a particular numeric column), exposed through the API.
This way we have the whole collected data in our data warehouse we can query anytime for analysis purposes, and we have modeled specific data usable by the BI tool through the APIs.
We handle data that can be publicly accessible, data related to specific customers, and data containing PII (Personal Identifiable Information). Because of this scenario, data security access is one of the most important components that Cube offers to us.
By using Cube, we have been able to implement the following security patterns:
We provide a lot of insights and answers through the data to our internal stakeholders and Cloud Academy customers.
Every hour, a lot of queries are performed on our data, and most of them require the data warehouse to process millions of rows to succeed. Because of that, having a caching layer is crucial for us to avoid overloading the warehouse with common queries.
Cube provides us with a caching layer to temporarily store the result of early executed queries. So, the same queries won’t hit the data warehouse again if executed after a little time.
Leveraging the caching layer allows us to get the result of the query faster than hitting the data warehouse, and this translates into faster loading of the charts that our users visit. The caching layer provided us with a performance boost of about 70% when hit.
Last but not least, we need to access our data quickly and through standard interfaces. Cube provides APIs to achieve this goal.
Depending on the tool you use, you could have multiple choices such as RESTful, or SQL.
In our scenario, we have two access points to the data:
Cloud Academy is a data-driven company, and Cube really helped us work and manage data that are crucial for our business.
It allowed us to model data in order to define dimensions and measures to be exposed to the BI tools. This way we’ve been able to mask all the underlying tables and logic.
By using Cube, we have been able to implement a strong security layer composed of data masking and row-level security, both for our internal usage (Superset) and for enterprise customer usage (React application).
Cube allowed us to significantly increase the speed of the queries, and so the time to retrieve a dashboard. Also, it provides us with pre-aggregations, a useful caching layer that we can manage in order to always speed up our queries (different from the native caching layer, which loses the data cached after a certain period).
We always try to keep updated with new approaches and tools that let us provide a better experience to our stakeholders on the available data. The ideal scenario would be for stakeholders that can easily access the data they need, in the manner they need, by letting us be compliant with security and privacy constraints.