Speech Demo


Introduction & Overview
Cognitive Services Features
10m 49s
9m 58s
Course Summary
1m 42s
Start course

Artificial Intelligence is not a future or distant concept; it is here and now, and being used by many companies of various sizes and industries. The foundational theory for AI was actually developed several decades ago, but recent advancements in big data, computing power, cloud, and algorithms have made it affordable and widespread today. With AI and Machine Learning, computers are now able to start reasoning, understanding, and interacting in ways that were never possible before.

Microsoft has created a predefined set of AI models available for companies of all sizes to start with called Cognitive Services, and best of all, they require little to no knowledge of data science. In this course, you will learn how to infuse your apps—on an architectural level—with the intelligence that Cognitive Services provide. We will cover what Cognitive Services are and how to use the various solutions they provide, including Vision, Speech, Language, Decision, and Web Search.

Learning Objectives

  • Understand the functionality provided by Azure Cognitive Services
  • Learn how to incorporate these services into your apps

Intended Audience

  • People who want to learn more about Azure Cognitive Services


  • Knowledge of Azure
  • Knowledge of at least one programming language
  • Experience using REST APIs

Okay, so for the STT demo, I'm here on the home page for the Speech-to-Text, and by scrolling on the sample sentences, I get to this portion of the page where I can either play some audio or record my own. Let me play this Sample 6 here:

Man: Where is Panthera tigris virgata native to?

As you can see, the page also advertises the advantages of using a custom model, which can understand better species' names in Latin. Now let me switch to the TTS part. For that, I created a simple PowerShell script, as I would like to show you how simple it is to call it from code. Here on the top, I'm just creating a new Speech Synthesizer object. On the SSML, I have two voices: one is an unaltered version with a female voice, saying: "This is a test of the Text-to-Speech API." The next phrase is the same, but I have exaggerated quite a bit a lot of the values for pause, pitch, speed and volume, so that you can see the difference. Let's play:

Woman: This is a test of the Text-to-Speech API.

Man: This is a test of the Text-to-Speech API.

As you can see, you can play quite a bit with the options of the SSML files to tailor the results to your preference. For the translation part, we will cover that in the next module, but I would like to give you an optional homework assignment, especially if you have a friend that speaks another language. Go to, and start a new chat. Send the link generated and ask your friend to speak in his or her native language; check what happens. Now let's take a look in the Language category of Cognitive Services.

About the Author

Emilio Melo has been involved in IT projects in over 15 countries, with roles ranging across support, consultancy, teaching, project and department management, and sales—mostly focused on Microsoft software. After 15 years of on-premises experience in infrastructure, data, and collaboration, he became fascinated by Cloud technologies and the incredible transformation potential it brings. His passion outside work is to travel and discover the wonderful things this world has to offer.