Skip to main content
This tutorial helps you configure Confidence to:
  1. Run queries in Databricks to compute exposure and metrics.
  2. Store assignment data as Parquet files in S3, and then load them into Databricks.
Step 2 is optional if you already have assignment data in Databricks. For example, if you are using a feature flagging solution other than Confidence Flags. This document targets the following audiences:
  • Administrators who want to set up Confidence for their organization

Before You Begin

  • You need to have a Confidence account.
  • You need to have a AWS account.
  • You need to have permissions to create S3 buckets, IAM users and roles, and manage the Databricks cluster.

Step 1: Create an S3 Bucket

To load assignment data, Confidence first copies Parquet files to an S3 bucket, and then triggers load jobs to copy these into Databricks.
  • Go to the S3 console, click Create bucket.
  • Give it a name, and put it in the same AWS region as you have your Databricks instance in.

Step 2: Create the Confidence IAM Role

Now you need to create an IAM role that Confidence can assume with the correct permissions. Two options for authentication are available. Either Confidence can use a regular AWS access key and secret to authenticate as an IAM User and then assume the role, or it can use AssumeRoleWithWebIdentity to authenticate without having to store any credentials, by using a Google service account as the trusted entity. AssumeRoleWithWebIdentity is usually preferable, but sometimes it might interfere with other settings such as custom identity providers. In those cases, you may need the credentials-based approach. Do either step 2a or 2b depending on what approach you choose.

Step 2a: Set up the Trust Policy for AssumeRoleWithWebIdentity

  • Go to the IAM console, click Roles and Create role
  • Select “Custom trust policy” as the trusted entity type.
  • In the text field, paste the following JSON snippet, replacing <service_account_id> with the unique service account ID you are using to authenticate from the Confidence side. You can find the ID in the Your Service Account ID box that is part of the configure flag applied connector form for Databricks.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "accounts.google.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "accounts.google.com:sub": "<service_account_id>"
        }
      }
    }
  ]
}
  • Click next, and don’t select any of the predefined permissions. Confidence adds its own inline policy that is more restrictive than the built-in policies.
  • Input a name for the role, for example, confidence-role, and then click Create role.

Step 2b: Set up the Trust Policy with an IAM User

  • Go to the IAM console, click Users and Create user
  • Give the user a name and create it.
  • Go to the user details and generate an access key and secret for the user. Keep the access key and secret for later when you configure the warehouse in Confidence.
  • Go to the IAM console, click Roles and Create role
  • Select “Custom trust policy” as the “Trusted entity type.
  • In the text field, paste the following JSON snippet, replacing <user_arn> with the ARN of the user you created in step 2 above (there is a button to copy the ARN on the user page).
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<user_arn>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Step 2c: Setup the IAM Role Policy

  • Find the role you created earlier and click it, then click the Add permission dropdown list and then Create inline policy
  • Switch the policy editor to JSON, and then paste the following snippet, replacing <s3_bucket_name> placeholders with the name of the bucket you created.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "s3:ListBucket",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::<s3_bucket_name>"
    },
    {
      "Action": [
        "s3:PutObjectAcl",
        "s3:PutObject",
        "s3:GetObjectAcl",
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::<s3_bucket_name>/*"
    }
  ]
}
  • Give the policy a name, click Next and Create policy to attach it to the role.

Step 3: Create Schemas for Confidence Data

Confidence needs to have a schema to write the results of exposure and metric calculations. These could either be separate schemas or the same, for simplicity you just create one schema for everything here.
  • Open a SQL notebook and run the following SQL to create the schema:
CREATE SCHEMA confidence;

Step 4: Create Service Principal

  • Go to the Databricks Identity and access settings and then Service principals.
  • Add a new service principal, name it what you like.
  • Generate an OAuth Client ID and secret for the service principal following the instructions from the Databricks docs.
Then set up the permissions for the service principal to have write access to the schema you created in the earlier step, and read access to any tables that contain metric data you want to use for experimentation.

Step 5a: Configure a Metrics Data Warehouse

  1. Go to the Confidence App.
  2. On the bottom of the left sidebar, select Admin > Connections > Metrics Data Warehouse.
  3. Select Databricks and configure the required settings.
  4. Click Save.

Step 5b: Configure a Flag Applied Connector

For Confidence to be able to store assignment data in Databricks, you need to set up a connector between Confidence and Databricks.
Assignment data is information on which users were assigned to which variants in the experiments you run. Assignment data goes into exposure calculations. Metrics use exposure to calculate results in your tests.
This connector is a “Flag Applied” connector. The connector is the part responsible for writing assignment to Databricks that Confidence Metrics can later read.
  1. Go to the Confidence App.
  2. On the bottom of the left sidebar, select Admin > Connections > Flag Applied.
  3. Click Create
  4. Select Databricks as destination.
  5. Enter the details from the earlier setup steps.
  6. Click Save.
When you click save or have entered the required details, Confidence tries to connect to Databricks and load some sample data. If you have mis-configured anything, you see an error message.

Step 5c: Configure an Assignment Table

For Confidence to use the stored assignment table, you need to set up an assignment table that reads from the Databricks table. You first need to create an entity, which represents the thing you’re experimenting on, like your users. To do so, follow these steps:
1

Go to the Confidence App

2

Navigate to the Databricks connection

On the bottom of the left sidebar, select Admin > Connections > Flag Applied and select the Databricks connection you created.
3

Click Create in the Assignment table section

4

Create or select an entity

Create a new entity or select an existing entity. Entities are the things you’re experimenting on, like your users. Enter User and specify the data type of the identifier that identifies the entity. For example, if you have a UUID that identifies your users, your primary key type is a String.
5

Enter assignment table name

Enter a name for the assignment table, such as flag_applied. This name should typically match the name you used in step 5b. Confidence can then read assignments from the destination table of your flag assignments.
6

Click Create

🎉 Well done! You are all set up and ready to go.

What’s Next?

The next step is to create a fact table, and a metric. For an overview, see the metric introduction page, and the metrics quickstart.