At Cloud Academy, we manage a lot of data every day. We have different sources we get data from such as feedback, events, and platform usage, and we need to get it, apply transformations, and finally present the data to our internal stakeholders and our customers.
Because of the variety of the data that we provide, we recently implemented Cube, a Headless BI solution. It allowed us to handle, model, and present data through our BI tools smoothly.
What is a Headless BI Tool?
A Headless BI tool is a set of components that acts as middleware between your data warehouse and your business intelligence applications. It provides us with four main data-related components without the need of designing and implementing custom solutions. It allows us to work with data without hitting the data warehouse directly but leveraging the abstraction layer represented by the tool.
The name Headless comes because the tool allows us to work with the data, but it deliberately delegates the task to show and visualize it. This is the responsibility of the BI tool.
A Headless BI tool offers the following four components:
- Modeling – It allows us to leverage data on the data warehouse and model it by defining dimensions and measures, usable by the BI tool
- Security – It allows us to declare who can access the data, and restrict shown data if needed
- Caching – It provides us with a caching layer to store the results of recent queries and speed up the next ones
- APIs – It provides us with one or multiple APIs (such as RESTful and SQL) to hit your data
Why does Cloud Academy use Cube?
We have a lot of data coming from multiple sources internal and external to Cloud Academy. We work with structured, semi-structured, and unstructured data. So, before hitting the data from our BI tool, we need an approach to prepare and model data in an effective way.
Cube allows us to create final entities composed of dimensions (attributes) and measures (aggregations of a particular numeric column), exposed through the API.
This way we have the whole collected data in our data warehouse we can query anytime for analysis purposes, and we have modeled specific data usable by the BI tool through the APIs.
We handle data that can be publicly accessible, data related to specific customers, and data containing PII (Personal Identifiable Information). Because of this scenario, data security access is one of the most important components that Cube offers to us.
By using Cube, we have been able to implement the following security patterns:
- Row Level Security – Depending on the user or entity that is accessing the data, some rows can be obfuscated. Suppose you are company A and want to get data about the platform usage of your users. You should not be able to access the usage data of company B. So rows related to company B are not provided when exploring the data.
- Data Masking – Depending on the user or entity that is accessing the data, some attributes could be masked because of permissions assigned to the user or entity. This mainly happens when the attribute contains personal information such as a name, an email, or a phone number.
We provide a lot of insights and answers through the data to our internal stakeholders and Cloud Academy customers.
Every hour, a lot of queries are performed on our data, and most of them require the data warehouse to process millions of rows to succeed. Because of that, having a caching layer is crucial for us to avoid overloading the warehouse with common queries.
Cube provides us with a caching layer to temporarily store the result of early executed queries. So, the same queries won’t hit the data warehouse again if executed after a little time.
Leveraging the caching layer allows us to get the result of the query faster than hitting the data warehouse, and this translates into faster loading of the charts that our users visit. The caching layer provided us with a performance boost of about 70% when hit.
Data Access through APIs
Last but not least, we need to access our data quickly and through standard interfaces. Cube provides APIs to achieve this goal.
Depending on the tool you use, you could have multiple choices such as RESTful, or SQL.
In our scenario, we have two access points to the data:
- Internal BI – It’s represented by all the data that we show to our internal stakeholders by using our internal BI tool: Superset.
- Customer Facing BI – It’s represented by the dashboards we provide to Cloud Academy enterprise customers. They are built by using the Recharts library and served through a React.js front-end application. Through these dashboards, they get insights about the Cloud Academy platform usage of their employees.
Cloud Academy is a data-driven company, and Cube really helped us work and manage data that are crucial for our business.
It allowed us to model data in order to define dimensions and measures to be exposed to the BI tools. This way we’ve been able to mask all the underlying tables and logic.
By using Cube, we have been able to implement a strong security layer composed of data masking and row-level security, both for our internal usage (Superset) and for enterprise customer usage (React application).
Cube allowed us to significantly increase the speed of the queries, and so the time to retrieve a dashboard. Also, it provides us with pre-aggregations, a useful caching layer that we can manage in order to always speed up our queries (different from the native caching layer, which loses the data cached after a certain period).
We always try to keep updated with new approaches and tools that let us provide a better experience to our stakeholders on the available data. The ideal scenario would be for stakeholders that can easily access the data they need, in the manner they need, by letting us be compliant with security and privacy constraints.