Get Started with Databricks

This tutorial helps you configure Confidence to:

Run queries in Databricks to compute exposure and metrics.
Store assignment data as Parquet files in S3, and then load them into Databricks.

Step 2 is optional if you already have assignment data in Databricks. For example, if you are using a feature flagging solution other than Confidence Flags. This document targets the following audiences:

Administrators who want to set up Confidence for their organization

Before You Begin

You need to have a Confidence account.
You need to have an AWS account.
You need to have permissions to create S3 buckets, IAM users and roles, and manage the Databricks cluster.

Step 1: Create an S3 Bucket

To load assignment data, Confidence first copies Parquet files to an S3 bucket, and then triggers load jobs to copy these into Databricks.

Go to the S3 console, click Create bucket.
Give it a name, and put it in the same AWS region as you have your Databricks instance in.

Step 2: Create the Confidence IAM Role

Now you need to create an IAM role that Confidence can assume with the correct permissions. Two options for authentication are available. Either Confidence can use a regular AWS access key and secret to authenticate as an IAM User and then assume the role, or it can use AssumeRoleWithWebIdentity to authenticate without having to store any credentials, by using a Google service account as the trusted entity. AssumeRoleWithWebIdentity is usually preferable, but sometimes it might interfere with other settings such as custom identity providers. In those cases, you may need the credentials-based approach. Do either step 2a or 2b depending on what approach you choose.

Step 2a: Set up the Trust Policy for AssumeRoleWithWebIdentity

Go to the IAM console, click Roles and Create role
Select “Custom trust policy” as the trusted entity type.
In the text field, paste the following JSON snippet, replacing <service_account_id> with the unique service account ID you are using to authenticate from the Confidence side. You can find the ID in the Your Service Account ID box that is part of the configure flag applied connector form for Databricks.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "accounts.google.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "accounts.google.com:sub": "<service_account_id>"
        }
      }
    }
  ]
}

Click Next, and don’t select any of the predefined permissions. Confidence adds its own inline policy that is more restrictive than the built-in policies.
Input a name for the role, for example, confidence-role, and then click Create role.

Step 2b: Set up the Trust Policy with an IAM User

Go to the IAM console, click Users and Create user
Give the user a name and create it.
Go to the user details and generate an access key and secret for the user. Keep the access key and secret for later when you configure the warehouse in Confidence.
Go to the IAM console, click Roles and Create role
Select “Custom trust policy” as the trusted entity type.
In the text field, paste the following JSON snippet, replacing <user_arn> with the ARN of the user you created in step 2 above (there is a button to copy the ARN on the user page).

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<user_arn>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Step 2c: Set Up the IAM Role Policy

Find the role you created earlier and click it, then click the Add permission dropdown list and then Create inline policy
Switch the policy editor to JSON, and then paste the following snippet, replacing <s3_bucket_name> placeholders with the name of the bucket you created.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:ListBucket",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::<s3_bucket_name>"
    },
    {
      "Action": [
        "s3:PutObjectAcl",
        "s3:PutObject",
        "s3:GetObjectAcl",
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::<s3_bucket_name>/*"
    }
  ]
}

Give the policy a name, click Next and Create policy to attach it to the role.

Step 3: Create Schemas for Confidence Data

Confidence needs to have a schema to write the results of exposure and metric calculations. These could either be separate schemas or the same, for simplicity you just create one schema for everything here.

Open a SQL notebook and run the following SQL to create the schema:

CREATE SCHEMA confidence;

Step 4: Create Service Principal

Go to the Databricks Identity and access settings and then Service principals.
Add a new service principal, name it what you like.
Generate an OAuth Client ID and secret for the service principal following the instructions from the Databricks docs.

Then set up the permissions for the service principal to have write access to the schema you created in the earlier step, and read access to any tables that contain metric data you want to use for experimentation.

Step 5a: Configure a Metrics Data Warehouse

Go to the Confidence App.
On the bottom of the left sidebar, select Admin > Connections > Metrics Data Warehouse.
Select Databricks and configure the required settings.
Click Save.

Step 5b: Configure a Flag Applied Connector

For Confidence to be able to store assignment data in Databricks, you need to set up a connector between Confidence and Databricks.

Assignment data is information on which users were assigned to which variants in the experiments you run. Assignment data goes into exposure calculations. Metrics use exposure to calculate results in your tests.

This connector is a “Flag Applied” connector. The connector is the part responsible for writing assignments to Databricks that Confidence Metrics can later read.

Go to the Confidence App.
On the bottom of the left sidebar, select Admin > Connections > Flag Applied.
Click Create
Select Databricks as destination.
Enter the details from the earlier setup steps.
Click Save.

When you click save or have entered the required details, Confidence tries to connect to Databricks and load some sample data. If you have misconfigured anything, you see an error message.

Step 5c: Configure an Assignment Table

For Confidence to use the stored assignment table, you need to set up an assignment table that reads from the Databricks table. You first need to create an entity, which represents the thing you’re experimenting on, like your users. To do so, follow these steps:

Go to the Confidence App

Navigate to the Databricks connection

On the bottom of the left sidebar, select Admin > Connections > Flag Applied and select the Databricks connection you created.

Click Create in the Assignment table section

Create or select an entity

Create a new entity or select an existing entity. Entities are the things you’re experimenting on, like your users. Enter User and specify the data type of the identifier that identifies the entity. For example, if you have a UUID that identifies your users, your primary key type is a String.

Enter assignment table name

Enter a name for the assignment table, such as flag_applied. This name should typically match the name you used in step 5b. Confidence can then read assignments from the destination table of your flag assignments.

Click Create

You are all set up and ready to go.

What’s Next?

The next step is to create a fact table, and a metric. For an overview, see the metric introduction page, and the metrics quickstart.

Metrics Introduction

Overview of metrics in Confidence

Fact Tables

Configure fact tables for metrics

Assignment Tables

Set up assignment tracking

Configure Metric Quickstart

Create your first metric

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

Before You Begin

Step 1: Create an S3 Bucket

Step 2: Create the Confidence IAM Role

Step 2a: Set up the Trust Policy for AssumeRoleWithWebIdentity

Step 2b: Set up the Trust Policy with an IAM User

Step 2c: Set Up the IAM Role Policy

Step 3: Create Schemas for Confidence Data

Step 4: Create Service Principal

Step 5a: Configure a Metrics Data Warehouse

Step 5b: Configure a Flag Applied Connector

Step 5c: Configure an Assignment Table

What’s Next?

Metrics Introduction

Fact Tables

Assignment Tables

Configure Metric Quickstart

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

​Before You Begin

​Step 1: Create an S3 Bucket

​Step 2: Create the Confidence IAM Role

​Step 2a: Set up the Trust Policy for AssumeRoleWithWebIdentity

​Step 2b: Set up the Trust Policy with an IAM User

​Step 2c: Set Up the IAM Role Policy

​Step 3: Create Schemas for Confidence Data

​Step 4: Create Service Principal

​Step 5a: Configure a Metrics Data Warehouse

​Step 5b: Configure a Flag Applied Connector

​Step 5c: Configure an Assignment Table

​What’s Next?

​Related Resources

Metrics Introduction

Fact Tables

Assignment Tables

Configure Metric Quickstart

Before You Begin

Step 1: Create an S3 Bucket

Step 2: Create the Confidence IAM Role

Step 2a: Set up the Trust Policy for AssumeRoleWithWebIdentity

Step 2b: Set up the Trust Policy with an IAM User

Step 2c: Set Up the IAM Role Policy

Step 3: Create Schemas for Confidence Data

Step 4: Create Service Principal

Step 5a: Configure a Metrics Data Warehouse

Step 5b: Configure a Flag Applied Connector

Step 5c: Configure an Assignment Table

What’s Next?

Related Resources