Baffle Advanced Data Protection
Baffle Advanced Data Protection provides a range of data encryption, tokenization and de-identification methods to protect data in data stores and cloud storage environments. Common methods that Baffle employs include column or field level encryption, tokenization, format preserving encryption (FPE), dynamic data masking, and record level encryption.
Baffle integrates with key management stores via a key virtualization layer. It also provides for a local key store so you can use your own keys for data protection in the cloud.
Baffle Advanced Data Protection for Snowflake
This document provides a high-level overview of how to set up Baffle Advanced Data Protection for decrypting data (re-identification) on Azure, and how to configure Snowflake to utilize it.
Baffle Advanced Data Protection enables column-level encryption for data in Snowflake (de-identification) and allows decryption (re-identification) of the data based on policy. Snowflake calls this feature External Tokenization. The data de-identification process requires that Baffle is used to encrypt (de-identify) data as it gets staged before ingestion into Snowflake, utilizing Baffle’s API or proxy. These methods can encrypt data as it is moved to a stage environment.
When executing a query on a Snowflake data warehouse, Snowflake is configured to automatically rewrite your queries to include calls out to Baffle DPS and enable decryption. The Baffle DPS decryption service runs in your own VPC and is based on a policy you create in Snowflake.
The following high-level architecture diagram illustrates the Snowflake interaction with Baffle DPS.
Using Baffle-provided tools, you create a set of external functions, one for each column in your schema that contains Baffle-encrypted data. Then, the tools will help you to create masking policies and attach them to your tables, such that Snowflake will automatically rewrite queries for those columns such that your external functions will be called. Your actual data consumers will continue to make queries against the tables, and calls to Baffle for decryption, where appropriate based on your masking policies, will be transparent. For information, see the Snowflake documentation on Calling External Functions via Azure API.
The Baffle Snowflake service is hosted in a serverless function, with scalability and reliability handled completely by Azure. This service relies on items in both Azure Blob Storage and Azure Key Vault. Finally, you create an API Management Service instance to handle the interaction between Snowflake and your serverless function.
- For cost effectiveness, your Snowflake and customer account should be hosted by the same cloud provider and in the same region. This is the assumed architecture.
- Install the Snowflake command line tools. For more information, see the Snowflake Installing SnowSQL documentation.
- Install the Azure command line tools. For more information, see the Microsoft Install Azure CLI documentation.
- Create an Azure Storage Account with a container called “config”.
- Inside the config container, upload the following files:
Deploying the Baffle Snowflake Service consists of creating Azure objects and Snowflake objects, then integrating both sets of objects. In the following procedure you will create the following objects in your Azure account:
- A Function App which is a serverless function that contains the code to re-identify data. In turn, this function will connect to two other Azure objects:
– Azure Blob Storage container to store configuration
– One or more Azure Key Vault keys to encrypt and decrypt Data Encryption Keys
- An API Management Service (AMS) instance to facilitate the communication between Snowflake and your Function App
- An Azure Active Directory (AAD) Enterprise Application to allow for authentication to your Azure account from Snowflake
- At least 1 Azure Blob Storage container, in which to store Baffle configurations and Data Encryption Keys
- An Azure Key Vault key, to encrypt your Data Encryption Keys
Step 1: Create an Azure Function App
- Create a directory called ‘baffle-azure-function’ and download the zip file into the directory.
- Download the Baffle Snowflake Service (baffle-azure-function.zip) and unpack (unzip) the file into an empty directory of your choice.
- Log in to the Azure command line: az login
- Create an Azure Function App from the Azure Portal, in the following way:
a. Select Create Function App
b. Select the Java Runtime Stack (version ??).
c. Publish ‘Code’ or ‘Docker Container’
d. Choose a Linux operating system (OS).
e. Choose Serverless as the Consumption Plan.
f. Click Review and Create.
g. Wait for the deployment to complete, then continue with the next step.
Step 2: Add Application Settings
NOTE: You will need a container called “config” in your storage account
In the function app Configuration settings panel, you will add two application settings:
- Set BAFFLE_CONFIG_STORAGE to a connection string for a blobstorage container that contains your Baffle configuration files, for example:
- Set JAVA_OPTS to -Xms2048m -Xmx2048m
- Click Save and Continue.
Step 3: Upload the decrypt function implementation
You upload the decrypt function implementation to the new Function App in the following way:
- On the command line enter the following: cd baffle-azure-function
- Run the following command, replacing the <variables> values with the values for your environment:
az functionapp deployment source config-zip -g <resource_group> -n <azure_function_app_name> --src snowflake-function.zip
Step 4: Configure the Function App for AD authentication
- In the Authentication / Authorization settings panel, toggle App Service Authentication to On.
- For Action to take when request is not authenticated, choose Log in with Azure Active Directory.
- Choose the Azure Active Directory provider.
- Set Management mode to Express.
- Click Create New AD App and follow the instructions.
- Make a note of the Azure AD App Client ID for the new Azure AD App. You will need it later. To find the App ID, open the AD console and search for the app.
- Click Save.
Step 5: Create and publish the Azure API Management Service
- Create the Azure API Management Service using the instructions in these Microsoft instructions
- Import and publish the Function App to the Azure API Management Service.
Step 6: Configure Baffle Admin
- On Snowflake, create a role called BaffleAdmin for the external functions access.
- Create a new role called baffleadmin on Snowflake to manage the Baffle integration. All of the integration objects will be owned and managed by this role. Further rights and usages can be delegated from this role. As ACCOUNTADMIN, perform something like the following:
CREATE ROLE BAFFLEADMIN;
- Apply the following permission grants for the BAFFLEADMIN role:
GRANT usage ON WAREHOUSE <yourwarehouse> TO baffleadmin;
GRANT create database ON ACCOUNT to baffleadmin;
GRANT create integration ON ACCOUNT TO baffleadmin WITH GRANT OPTION;
GRANT apply masking policy ON ACCOUNT TO baffleadmin WITH GRANT OPTION;
4. As baffleadmin, create a database/schema/tables/columns for your data that match the BPS.
Step 7: Create a Snowflake API integration object
- As baffleadmin, grant access to the baffleadmin role to any user who might be administering your Baffle/Snowflake integration.
- On Snowflake, create the API integration object and authorize it in the following way, replacing the variables with the values for your environment:
CREATE OR REPLACE API INTEGRATION baffle_api
API_PROVIDER = azure_api_management
AZURE_TENANT_ID = '<your_tenant_id>'
AZURE_AD_APPLICATION_ID = '<your_app_id_from_step_5.a.vi>' API_ALLOWED_PREFIXES = ('https://<your_api_name>.azure-api.net/')
ENABLED = true
- To grant Snowflake access to your Azure tenant, enter the following command:
DESCRIBE API INTEGRATION baffle_api
- Find the AZURE_CONSENT_URL value in the response, paste it into your browser, and click Accept.
Step 8: Configure Azure API and create Snowflake masking policies
- Set your Azure API to be accessible ONLY from your Snowflake integration, by adding a validate-jwt policy on your Azure API Management Service using the instructions in this Microsoft documentation.
- To create your Snowflake external functions and masking policies, in the baffle-azure-function directory, run the setup-snowflake.sh script, passing in the path to your BafflePrivacySchema (BPS), as shown in this example:
./setup-snowflake.sh /path/to/BafflePrivacySchema.toml > create-objects.sql
- Run the newly-created create-objects.sql script in Snowflake, either from the snowsql command line or the Snowflake console.
- To run from snowsql, use a command such as the following:
snowsql -u username -a <snowflake_instance> -D BASEURL=https://<your_api>.azure-api.net/<api_name> -f create-objects.sql -d <database> -s <schema> -r baffleadmin