Controlling Access to Google BigQuery
In these videos, I’ll show you how to use predefined roles to grant fine-grained access to BigQuery datasets and tables. Next, I’ll show you how to restrict access to particular columns or rows in a table by creating an authorized view. And finally, you’ll learn how to use BigQuery’s column-level access control to restrict access to particular columns in a more manageable and centralized way.
By the end of this course, you'll be able to:
- Implement fine-grained access control in BigQuery using roles.
- Restrict access to particular columns in BigQuery tables using either authorized views or column-level access control.
- Data engineers
- Cloud architects
- Experience with BigQuery (or take Introduction to Google BigQuery)
Data can be one of the most precious resources an organization owns. So it’s important to keep tight control of not only who’s allowed to read your data, but also who’s allowed to modify or delete it. That might seem pretty basic, but doing this with BigQuery is more complicated than it sounds.
BigQuery provides several layers of access control. The top layer is basic roles, which act at the project level. They’re administered through IAM, the Identity and Access Management system.
There are three basic roles: viewer, editor, and owner. When you add a member to a project, you can assign them one of these roles, which will apply to all Google Cloud Platform services, not just BigQuery. A viewer can view all datasets and run jobs (such as queries). Editors have viewer permissions but can also modify or delete all tables. They can’t modify datasets, but they can create new datasets. An owner has editor permissions but can also delete all datasets and see all jobs for all users in the project.
Basic roles are not a good idea in most cases because they’re not fine-grained. Instead, you should use predefined roles.
There are many predefined roles for BigQuery. They’re fine-grained in two ways. First, they can be assigned at any of these levels:
Second, they usually grant more specific permissions than basic roles. That’s true even for the Data Viewer, Data Editor, and Data Owner roles. They’re similar to the basic Viewer, Editor, and Owner roles, but they don’t give users permission to run jobs or queries. Those permissions can be granted through the User and Job User roles. A Job User can only start jobs and cancel jobs. A User, on the other hand, can perform a variety of other tasks, such as creating datasets. The Admin role gives all permissions.
Now I’ll show you how you can use predefined roles to give more fine-grained access than basic roles in a few different situations:
To give a different level of access to BigQuery than to other GCP resources in a project, use BigQuery roles, such as BigQuery Data Editor. Then also use predefined roles for any other GCP resources they need to access, such as App Engine Admin.
To give a user or group a different level of permissions for a dataset, click the menu next to the dataset and select “Share”. Then click “Add Principal”, type the email address of the user or group, and select one of the BigQuery roles.
To give job-running permissions without giving data access permissions, select either BigQuery User or BigQuery Job User. In most cases, you should give BigQuery User because it lets the user list datasets and tables as well as create new datasets. Job Users can only run jobs.
You might be wondering why you’d want to separate data access permissions from job-running permissions because it would seem like most users would need both. That’s true in many cases, so remember that you have to assign both types of roles to users in order for them to look at the data and run queries (unless you’re using basic roles). But there are situations when you might want to give only one or the other. For example, if you have an application that monitors the size of your tables, then you might want to assign only the BigQuery Data Viewer role to its service account. That way, even if the application developers accidentally make a change to the program that would cause a query to run, it will be disallowed. Why would that matter? Because queries incur a cost, and a program could potentially run up some hefty charges if there’s a bug.
So far, I’ve only shown you how to set permissions at the project and dataset levels, but you can also grant roles at the table and view levels. To do it at the table level, open a table in BigQuery, and click the Share button. This opens the same dialog box as we saw at the dataset level.
Giving access at the view level is a much more complicated process. The only way to do it is to use something called an authorized view.
An authorized view is different from a regular view because it allows users to access the results of a query without giving them access to the tables that were queried. So, for example, if you didn’t want a particular group of users to have access to certain columns in a table, you could run a query that didn’t include those columns, and then save the results as an authorized view for that group.
To make this work, you need to perform five steps:
First, assign a role to the group that lets them run queries in the project. The BigQuery User role is usually the best choice.
Second, create a separate dataset to store the view. You need to do this because if you were to put the view in the same dataset as the original table, then the group would be able to access the table, too, and not just the view.
Third, create the view in the new dataset.
Fourth, give the group read access to the dataset containing the view.
Fifth, authorize the view to access the source dataset.
Let’s say you wanted to give a group called “team1” access to only the name, age, and gender fields in the persons_data table. Before I start, I should mention that if you don’t currently have permission to assign roles to other users, then you won’t be able to do all of the steps I’m about to show you.
Okay, first set the project-level permissions in IAM. Click “Grant Access”. Type the email address of the team1 group. For the role, select “BigQuery” on the left, and then select “BigQuery User”. And save it.
Next, go to BigQuery, and create a new dataset to store the view. Call it “shared_views”.
Then, run a query to select the name, age, and gender fields from the table.
SELECT fullName, age, gender
Now click “Save”, and select “Save view”. Change the dataset to “shared_views”, call the table “persons_view”, and click “Save”.
Then open shared_views, click “Sharing”, and select “Permissions”. Click “Add Principal”. Then fill in the email address of the team1 group. For the role, select “BigQuery Data Viewer”. Then click Save and Close.
Now that team1 has read access to the view, the only thing left to do is to give the view read access to the persons_data table. We need to do this because the view takes on the permissions of the person using it, and since team1 doesn’t have access to the persons_data table, they’d get an error if they tried to use this view.
This is finally the point where we’re going to turn this into an authorized view. Open the dataset that contains the persons_data table, click “Sharing”, and select “Authorize Views”. Type “persons_view”. And select it. Click “Add Authorization” and “Close”.
Now users in team1 will be able to run queries on this view, even though they don’t themselves have access to the persons_data table.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).