Skip to main content

Preset Development Overview

Presets in DASL define how data is ingested, transformed, and serialized in a data source pipeline similar to a complex data flow or ETL operation. A preset specifies the configuration needed to take raw files (typically in a Unity Catalog Volume or External location) from an upstream source and convert them into a structured format compatible with DASL’s gold tables which follow the OCSF schema. This process includes reading data using the Databricks Autoloader, applying joins, transforms and filters, and finally streaming data into the appropriate Unity Catalog tables.

Presets are reusable and declarative. They act as living references that define how to process data from a specific source like AWS, Azure, GCP, or Okta. These references are typically authored by Antimatter, but DASL also supports custom, user-defined presets. When a user creates a data source based on a preset, all of the up-front work of figuring out how to transform and clean the data has already been defined in the preset and is ready to be used or overwritten as needed.

Presets and Data Sources

A data source in DASL represents a specific ingestion pipeline in Databricks, implemented as a Databricks job. It is (optionally) created using a preset as a base and includes additional parameters such as the data location and scheduling configuration which cannot be known ahead of time. When a data source references a preset, it can modify or replace parts of the preset logic as needed. For example, field definitions provided by the preset can be reused as-is or replaced entirely depending on your specific needs.

What This Section Covers

This section of the documentation introduces how to create, test, and use your own custom presets. The following pages will walk through:

  • Tools available for developing and previewing presets within Databricks notebooks
  • How to structure and register user-defined presets
  • The schema expected in a preset.yaml file