Articles in this section

Setting Up a Databricks Connection

Overview

MessageGears' campaign management software integrates directly with Databricks using direct API integrations with the Databricks platform. This integration allows users to access their Databricks data natively with our audience builder, content personalization, and orchestration features without replication, data mapping or synchronization. The Databricks native connection also allows you to specify a Cloud Storage Bucket (S3, GCS) for bulk operations, such as Audience Recording, and FastCache.

To get started, there are three high-level steps to follow to get your Databricks Native Connection up and running.

Step 1: Create a Cloud File Storage

Step 2: Create a Database Connection

Step 1: Create a Cloud File Storage

When creating a Cloud File Storage, you'll need an S3 or GCS bucket already created, as well as authentication for that bucket. The name selected for this file must be one that has not been previously created.

  1. First, go to Admin > System configuration > File Storage and select your file type.
    (Note: Currently we only support Google and Amazon file types)
  2. Add the New File Storage Window, and fill out the info as required.
    Ex. AWS Key, AWS Secret, add a bucket.

And there you go, you should be set up with a new Cloud File Storage account.

Step 2: Setup Databricks driver and service principal

The driver must be properly authenticated in order for Accelerator to connect to Databricks. Following these steps will authenticate the driver and establish the correct authentication for Accelerator to connect with Databricks

  1. Authenticate the driver to enable Accelerator to connect to Databricks.
  2. Authorize unattended access to Databricks resources with a service principal using OAuth

Step 3: Create the External Location

It is necessary for Databricks to communicate with S3 during extraction. This setup allows Databricks to interact with files on S3.

  1. Create appropriate IAM role
  2. Create the External Location in Databricks
    • Documentation
      • Make sure that the user has permissions to read, write, and create external tables
    • Note, if you’re targeting a path under a bucket, and not the whole bucket, you’ll need to split out permissions so that the bucket permissions are attached separated.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::drz-data-the-second"
            ]
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::drz-data-the-second/nested/another_nested_folder/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": [
                "arn:aws:iam::738293571822:role/databricks_nested_directory"
            ],
            "Effect": "Allow"
        }
    ]
}

Step 4: Create a Database Connection

Database Connections must be configured before an Audience can be created for Databricks. Database Connections are the settings and credentials that allow Accelerator to connect to the Databricks database that holds the necessary customer data.

For more information on Database Connections, click here.

Required Information:

Databricks Hostname, Databricks Client ID, Databricks Secret, and Databricks Http Path

Have an Accelerator System Administrator connect to the Databricks instance using the Accelerator Database connection portal.

  1. Select "Databricks (Native)" as your Database Type.
  2. Enter a unique and meaningful name for the database connection.
  3. Enter the Host. This is the host URL as defined by Databricks.
  4. Enter the Port.
  5. Enter a Client ID and Secret generated from the above Step 2.
  6. Select the Cloud Storage bucket created in Step 1 above.
  7. Select the purposes for this connection. If this is your only connection, feel free to select all here.
  8. Optionally, make this the default connection for the application so that users are able to have this connection used by default in the various locations within Accelerator.
  9. You can now further define the Connection Defaults. Defaults are settings such as a Default Table or Default Schema that allow an Admin to make the experience in Accelerator more hassle-free by dictating the default areas where an Accelerator user will access data. The currently supported defaults are:
    • Table
    • Schema
    • Database
    • Warehouse
    • Unique Id Column
    • Preference Column
    • Email Column
    • Push App
    • Push Address
    • Push Service
Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.