Intro and Synapse Analytics
The course is part of this learning path
This course explains some additional topics you should make sure you understand before taking the DP-201 exam, including:
- Synapse Analytics
- Table Storage
Table storage isn't a major focus of the DP-201 exam, but there is one Table Storage topic that's important for you to know: how to choose a partition key. Like Synapse Analytics, Table Storage achieves high scalability by distributing data across multiple partitions. It decides where to put each entity based on the partition key you choose, so if you don't select an appropriate partition key, you'll run into performance problems.
There are many factors to consider when choosing a partition key, but one of the most important is the type of operations that will be performed most frequently. Will most of the operations be reads or writes? If they'll mostly be reads, what types of queries will be the most common?
For example, suppose you have a table with employee data, and most of the queries will be requests for a single employee's data. In this case, you'd want to have many partitions with a small number of employees in each. That way, it could serve a large number of simultaneous queries because the requests would be distributed across different servers. If you select the employee ID as the partition key, then each employee entity will be in a separate partition. That doesn't mean they'll each be on a different server, though, because each partition server can hold multiple partitions.
It gets more complicated when some of your most common queries retrieve multiple entities in each request. For example, suppose that you frequently need to run a query on all of the employees who are in the same department. In this case, the query would have to retrieve the data from many different partition servers, which would take a long time. It would be much faster if all of the employees in a particular department were stored on the same partition server. The way to make this happen is to use the department as the partition key.
However, if you only have a few departments, then using the department as the partition key would distribute the employees across a small number of partitions, which could slow down performance when making many single-employee queries at the same time. This is why it can be quite difficult to choose a partition key that works well in all cases.
Now consider applications that mostly send writes to the table. For example, suppose your table contains time-series data. That is, it contains records with a timestamp, such as stock market trades. These records will be written to the table in order by date and time. If you use the timestamp as the partition key, then consecutive records will be written to the same partition server, which would create a hotspot, resulting in poor performance.
To avoid this problem, you could choose another property, such as the stock symbol, as the partition key. That would distribute the records across many partition servers.
This is a complicated topic, so you might want to read the page at this URL. That's all for additional topics for the DP-201 exam. If you have any questions or comments, please let us know.
Thanks and good luck on the exam!
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).