1. Home
  2. Training Library
  3. Big Data
  4. Courses
  5. What is a Data Engineer, What Skills Do You Need, and is the Data Engineer Role Right For You?

Key Skills of a Data Engineer

Start course

Is becoming a data engineer the right move for you? This Introduction to the Data Engineer Role course walks you through the ins and outs of being a data engineer: the tasks and responsibilities of the role, the skills necessary to carry them out, and the personality traits best suited to working as a data engineer. You will also learn the differences between data engineers and both data scientists and database administrators. All of this should help you get a clear idea of what a career in data engineering looks like and if it's the right one for you!

Learning Objectives

  • Understand what a data engineer does
  • Learn the differences between a data engineer and a data scientist and database administrator
  • Understand the skills and character traits that make a successful data engineer

Intended Audience

Anyone who is thinking of embarking on a career as a data engineer, whether they're fresh out of school and thinking about their future, or they already have years of experience under their belt but want a career change.


None! This course is open to everyone.


Let's now look at the skills and character traits that often work well for a data engineer. So you can start to define whether this role might be right for you. So what are the key skills that need to be mastered for the data engineer role? First, understanding data fundamentals. So a key foundation stone is knowing how data is structured and stored by machines.

We need to understand the various data types, such as variables, varchar, int char, Int numbers, prime numbers. We need to understand named pairs, and how they are stored in not only SQL structures. I never had this training when I started, and I wish I had. Data and databases are constantly changing, so having a good foundation knowledge of how data is stored and retrieved by machines means you will always feel confident in learning new concepts and systems.

Second, having a working knowledge of SQL. So knowing SQL generally means that you have an understanding of the various relational databases out there, how they work and the syntax that each use.

Third, working knowledge of regular expressions. So understanding how to manipulate data with regular expressions is a really important skill to have. And regular expressions are used across data formats, platforms, in most programming languages.

Fourth, a working knowledge of JSON. So we have two types of databases. There's a relational database and a non-relational database. A non-relational database, or a not only SQL database uses JSON to request, retrieve, and write to key peers. You're going to need to know how data schemes work and how you can write queries using JSON.

And fifth, machine learning, having a good and a ground level understanding of how machine learning works in principle and in practice provides a really solid foundation for understanding how to get data in and out of systems in ways that can be used by data scientists. While working with the machine learning models is unlikely to be part of your day-to-day routine as a data engineer. Knowing the languages and the models that data scientists will be working with really helps you understand what form and quality of data they will need from your architecture.

And sixth, programming languages. This is more of a should than a must-have. The more hands-on experience you have with programming languages, the more effective and flexible you can be as a data engineer.

Most of the challenges faced by data engineers revolve around finding creative ways to solve data problems. There isn't one set way of doing things. With data, it's often about the context of where the data is coming from and how it's going to be used, how it was collected, and what the likely errors in it are going to be. So being able to apply any of the programming languages to solve simple data problems is a very, very powerful tool for the data engineer.

The types of language support that I recommend, firstly, Python as a priority. I think Python is one of the most common languages out there. It's very flexible. The way that it handles data types is very useful. Go, Scala or Java, fill out that list. Languages that support good manipulation of string data types are the best ones to master for data engineering in my view.

So let's not lose sight of the fact that this is a technical role, okay? But it's not all about technology. Having a data-driven approach to problem-solving is a major plus. Having the ability to visualize and communicate complex concepts. And that doesn't have to be with technology. Just being able to explain concepts to people who are technical and non-technical. And I think most importantly, having a strong sense of ownership. This is a job where you're literally given a framework to work with. You're expected to come up with the program to run that.


Introduction - What Does a Data Engineer Do? - Data Engineer vs. Data Scientist - Data Engineer vs. Database Administrator - Key Traits of a Data Engineer - Is This Role Right For Me?

About the Author
Andrew Larkin
Head of Content
Learning Paths

Head of Content

Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe.  His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.