- Run queries in Databricks to compute exposure and metrics.
- Store assignment data as Parquet files in S3, and then load them into Databricks.
- Administrators who want to set up Confidence for their organization
Before You Begin
- You need to have a Confidence account.
- You need to have a AWS account.
- You need to have permissions to create S3 buckets, IAM users and roles, and manage the Databricks cluster.
Step 1: Create an S3 Bucket
To load assignment data, Confidence first copies Parquet files to an S3 bucket, and then triggers load jobs to copy these into Databricks.- Go to the S3 console, click Create bucket.
- Give it a name, and put it in the same AWS region as you have your Databricks instance in.
Step 2: Create the Confidence IAM Role
Now you need to create an IAM role that Confidence can assume with the correct permissions. Two options for authentication are available. Either Confidence can use a regular AWS access key and secret to authenticate as an IAM User and then assume the role, or it can use AssumeRoleWithWebIdentity to authenticate without having to store any credentials, by using a Google service account as the trusted entity.AssumeRoleWithWebIdentity is usually preferable, but sometimes it might interfere with other settings such as custom identity providers.
In those cases, you may need the credentials-based approach.
Do either step 2a or 2b depending on what approach you choose.
Step 2a: Set up the Trust Policy for AssumeRoleWithWebIdentity
- Go to the IAM console, click Roles and Create role
- Select “Custom trust policy” as the trusted entity type.
- In the text field, paste the following JSON snippet, replacing
<service_account_id>with the unique service account ID you are using to authenticate from the Confidence side. You can find the ID in theYour Service Account IDbox that is part of the configure flag applied connector form for Databricks.
- Click next, and don’t select any of the predefined permissions. Confidence adds its own inline policy that is more restrictive than the built-in policies.
- Input a name for the role, for example,
confidence-role, and then click Create role.
Step 2b: Set up the Trust Policy with an IAM User
- Go to the IAM console, click Users and Create user
- Give the user a name and create it.
- Go to the user details and generate an access key and secret for the user. Keep the access key and secret for later when you configure the warehouse in Confidence.
- Go to the IAM console, click Roles and Create role
- Select “Custom trust policy” as the “Trusted entity type.
- In the text field, paste the following JSON snippet, replacing
<user_arn>with the ARN of the user you created in step 2 above (there is a button to copy the ARN on the user page).
Step 2c: Setup the IAM Role Policy
- Find the role you created earlier and click it, then click the Add permission dropdown list and then Create inline policy
- Switch the policy editor to JSON, and then paste the following snippet, replacing
<s3_bucket_name>placeholders with the name of the bucket you created.
- Give the policy a name, click Next and Create policy to attach it to the role.
Step 3: Create Schemas for Confidence Data
Confidence needs to have a schema to write the results of exposure and metric calculations. These could either be separate schemas or the same, for simplicity you just create one schema for everything here.- Open a SQL notebook and run the following SQL to create the schema:
Step 4: Create Service Principal
- Go to the Databricks Identity and access settings and then Service principals.
- Add a new service principal, name it what you like.
- Generate an OAuth Client ID and secret for the service principal following the instructions from the Databricks docs.
Step 5a: Configure a Metrics Data Warehouse
- Go to the Confidence App.
- On the bottom of the left sidebar, select Admin > Connections > Metrics Data Warehouse.
- Select Databricks and configure the required settings.
- Click Save.
Step 5b: Configure a Flag Applied Connector
For Confidence to be able to store assignment data in Databricks, you need to set up a connector between Confidence and Databricks.Assignment data is information on which users were assigned to which variants
in the experiments you run. Assignment data goes into exposure calculations.
Metrics use exposure to calculate results in your tests.
- Go to the Confidence App.
- On the bottom of the left sidebar, select Admin > Connections > Flag Applied.
- Click Create
- Select Databricks as destination.
- Enter the details from the earlier setup steps.
- Click Save.
Step 5c: Configure an Assignment Table
For Confidence to use the stored assignment table, you need to set up an assignment table that reads from the Databricks table. You first need to create an entity, which represents the thing you’re experimenting on, like your users. To do so, follow these steps:Navigate to the Databricks connection
On the bottom of the left sidebar, select Admin > Connections > Flag Applied and select the Databricks connection you created.
Create or select an entity
Create a new entity or select an existing entity. Entities are the things you’re experimenting on, like your users. Enter
User and specify the data type of the identifier that identifies the entity. For example, if you have a UUID that identifies your users, your primary key type is a String.Enter assignment table name
Enter a name for the assignment table, such as
flag_applied. This name should typically match the name you used in step 5b. Confidence can then read assignments from the destination table of your flag assignments.
