image
Cleaning Data
Start course
Difficulty
Beginner
Duration
35m
Students
3553
Ratings
4.9/5
Description

Power BI has changed the BI landscape forever, enabling BI professionals and regular Excel users alike to work with big data and build insightful dashboards. 

Learn to use this powerful business intelligence solution from the ground up. Navigate the intuitive user interface and explore the ecosystem of data modeling tools. Discover outside-the-box visualizations and broadcast your insights to colleagues in the Power BI Service. This Course gives you a solid foundation to begin your Power BI journey. 

Learning Objectives

On completing this Course, learners will be able to:

  • Identify the primary components of the Power BI interface: reports, data, and model views
  • Import Excel data and build basic visuals
  • Publish a desktop report to the Power BI Service
  • Identify common challenges in Power BI data models, implement smart solutions, and avoid common mistakes

Intended Audience

  • Business professionals whose job requires them to design, build, or deliver business intelligence metrics
  • Anyone preparing to take the Microsoft PL-900 exam

Prerequisites

A desire to learn to use Power BI

Transcript

Data comes to us in various forms and is sometimes not so tidy. And at other times, almost unusable. As you know, cleaning up data manually can be very tedious. And when our databases grow exponentially, this problem only compounds, but the Power Query engine has a seemingly inexhaustible list of tools and tricks to get your data in tip-top shape. Like other Microsoft products, it has that same look and feel, with our ribbon, its tabs and our command groups and our commands.

Now that we are in Power Query, we will clean up this data. Why would we care to clean it? Well, for instance, here, we have some meaningless data that doesn't belong, like these no values, which are on a useless line, that isn't part of the data. It's just a title line. No values can skew our data by adding a blank category to our visuals, but we can avoid that issue by removing these rows, using the super handy tool called, remove rows. And then we'll select, remove from top. A dialog box pops up asking us how many rows we would like to remove.

We will choose two and click okay, easy-peasy. And the other options in here are handy as well. If you have blank rows in the middle, for instance, just choose the option to remove all blank rows. If you want to find and remove duplicates, we have that option as well, but I think our rows look good. Let's see what else we need to do. It looks like we have some unnecessary columns, so you can manually remove a column by highlighting it and then clicking, remove columns, remove column. Or you can select multiple columns at once by holding the control key and selecting one by one.

When you've selected what you need to remove, then we can remove them, which we did last time using the buttons on the ribbon. But this tool is also available if you right-click on the header and in this right-click menu, we see the option to remove columns. You'll see there are lots of tools in the right-click menu. Now, let's pretend for a moment that these few columns aren't the ones I wanna get rid of, they're actually the only ones I wanna keep. I can choose, remove other columns. And now all I have left are the three columns I selected.

Okay, it turns out I actually didn't mean to keep these, I meant to delete these. So, how do I undo things? Well, each of the steps we have taken have actually been recorded on the right-hand side, in the applied steps section. Power Query memorizes each step we take so that you can simply refresh or have Power Query rerun the memorized steps, rather than you performing each one of these steps every single time you wanna refresh your data.

So, that's pretty cool. And it has a nifty side benefit. You can hover over these steps and reverse an action. For instance, I can remove my last action by clicking on the X. Be careful when you remove steps inside the list, because all of the following steps might get messed up. Until you're used to this tool, just play it safe and backpedal step-by-step until you get to the actual issue. But that worked for me. So, phew, so glad we were able to undo that last step.

Now, let's remove the columns I actually wanted to remove. Great. Now, I'm seeing one last problem here, and that is that our data's headers are showing up as their own row in row one. That is going to be problematic later on when we're creating our dashboards. We won't be able to identify which column of data is which, because all the columns will have generic names like column one and column two. But I do have the column headers in here, they're just showing up as the first row of data.

So, let's use the first row as headers. And now my columns are named. Keep in mind, there are so many more neat tools, just like the ones we just looked at. Power Query is definitely worth exploring. All right, that looks great. The data from this data source is all cleaned and ready to be modeled and analyzed. Let's just give this query a helpful name and move on.

About the Author

Chelsea Dohemann is a Senior Technical Trainer and Microsoft Certified Master with almost a decade of experience in technology training. She has taught an array of applications from Microsoft products including Office 365 web apps, Microsoft Office Suite, Power BI, VBA for Excel, and SharePoint to Adobe Acrobat Pro and Creative Cloud. Being a persistent learner herself, Chelsea is acutely in-tune with the challenges of learning. She presents her topics in plain language, with real-world examples, reducing complex concepts down to their simple parts.