1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. Using SQL to Import and Validate Data from CSV Files

Process Overview

The course is part of this learning path

Learn SQL - From Newbie to Ninja
Start course


This is an intermediate level course suited to DBAs, system analysts, programmers, or anyone wanting to learn how to use SQL to solve business scenarios. It will help to have an understanding of Structured Query Language syntax and functions and regular expressions, or some experience with script-based programming languages. Our learning objectives for this course are to recognize and explain use cases where Structured Query Language, SQL, can be applied in business situations, explain and apply SQL procedures to import and parse text files, define and apply automation processes to parse and sort data to specific databases. We welcome all comments and feedback. So please feel free to reach out to us at support@cloudacademy.com with any questions or comments.


- [Lecturer] The process begins with the data files saved to a location that is accessible via the SQL server instance. We then need to determine which customer the files have come from. In this case, our task is made easy, as the file names indicate which customer sent them. Next we need to get the data from the files into a database table. From that point we can manipulate the data into a common structure that will allow us to easily update our production data, and perform any other required functions. As this process is going to be running regularly as a scheduled job, we need it to be robust and error tolerant. There are two error classes we need to keep in mind. Data validation errors, and process errors. Data validation errors are where the data or individual field is not of the correct format, or values are not expected. Our process should be designed to handle this situation without failure and allow the correct data to be processed. Operational staff should be notified with validation errors so they can be manually dealt with. Process errors are where something unforeseen happens. It causes a step in the process to fail completely. If one file fails in an unforeseen way, we don't want the whole process to fail completely, because there may be other files to process. We also don't want it to be in a situation where a file has been partially processed into the production database. If a file fails in this way, we need it to fail in it's entirety, so if possible we can fix and reprocess it. With that in mind, we'll look at transactions and error handling in SQL server. I just want to talk briefly about the data. If you have ever used an ATM or money machine, you will know it will dispense various denominations of currency. The currency is stored in denomination-specific canisters or cassettes. These terms are one and the same. The customer knows the ATM by a coded identifier, and provides a breakdown of the denominations they want to fill or replenish the ATM with. For example, put $100,000 of 50s, and $120,000 of 20s into ATM ABC123. In our case, the ATM is configured with four canisters. We will need to look up the ATM from a table and based on it's canister denomination configuration, divide the currency appropriately among the canisters. Before we get into the code, let's have a quick look at what the finished product will look like while running this completed script. Now and see the files moving from our source folder into our archive folder and then the notification emails turning up. So I just highlight the script and execute, these are our source files, as we see they are being moved into our archive folder. If we go to Outlook, we can see these notification emails turning up from various cache centers. So, if we have a look at this one for World Bank, you can see all the ATMs in their orders for this particular location. The denomination breakdown on how those denominations have been allocated across various cassettes. So, with that in mind, let's get in and look how we're going to do this.

About the Author
Learning paths1

Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a  Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard. 

Covered Topics