Adding user defined presets to the DASL workspace
To support user-defined presets, the DASL workspace can fetch presets from a location within your own catalog. Presets in this location will be collected and shown in the DASL UI for your workspace, and will also be available through the preset summary endpoint. To distinguish user-defined presets from those published by Antimatter, all custom preset names should begin with the internal_
prefix.
Setting up a preset workspace
The DASL workspace requires that the catalog directory containing user-defined presets contain an index.yaml
file at the root. This index file is used to discover what presets are available for listing. If you do not want a custom preset to be included, you can omit it from this file and it will not be collected.
The index.yaml
file contains an array of registered presets, each identified by a source
and sourceType
field. The source
represents the provider of the data — often a cloud service such as AWS or GCP — while the sourceType
refers to the type of data being ingested from that source. For example, AWS offers multiple services such as Route 53, S3, and CloudTrail. Each of these services represents a different sourceType
. Here is an example entry for processing AWS CloudTrail data:
presets:
- source: "aws"
sourceType: "cloudtrail"
For each preset listed in index.yaml
, a corresponding <source>/<sourceType>/preset.yaml
file must exist. The source
and sourceType
are used to construct the path to this preset file (meaning your sourceType
values must be unique within a single source
). The preset.yaml
file contains all of the preset's data processing instructions and follows the format described here. It's important to note that this file will also have more metadata about the preset such as the author
, description
, and most importantly the name
.
The name field in preset.yaml
is constructed by joining the source
and sourceType
with an underscore — for example: name: aws_cloudtrail
. For user-defined presets, you should add the internal_
prefix, resulting in names like internal_aws_cloudtrail
. This naming convention is described in more detail later in this document.
Example directory layout
As an example, if the index.yaml
file was defined as:
presets:
- source: "akamai"
sourceType: "waf"
- source: "cloudflare"
sourceType: "httpreq"
- source: "cloudflare"
sourceType: "example"
Then the directory would be expected to contain at least the following:
index.yaml
akamai/waf/preset.yaml
cloudflare/httpreq/preset.yaml
cloudflare/example/preset.yaml
If these were all custom presets you developed then each would have a name value starting with internal_
such as the following Akamai preset.yaml
example:
name: internal_akamai_waf
author: My Company
description: Processing for Akamai WAF data
...
Linking the preset workspace to the DASL workspace
To associate user-defined presets with a DASL workspace, update the workspace configuration to include the dasl_custom_presets_path
. This can be done through the UI or using the Python API client. To update the preset path from a Databricks notebook you can use the following Python code:
%pip install dasl-client
import dasl_client
# Get the DASL client and fetch the current workspace config
client = dasl_client.Client.for_workspace()
config = client.get_config()
print(f"the current custom preset path is: {config.dasl_custom_presets_path}")
# update the custom presets path to point to the root volume
# the root is the directory where DASL will look for an index.yaml file,
# describing all custom presets.
config.dasl_custom_presets_path = "/Volumes/example/storage/presets"
client.put_config(config)
# fetch the preset path after the update to confirm it has been updated
config = client.get_config()
print(f"the updated path is: {config.dasl_custom_presets_path}")
After the path is updated, the DASL workspace will fetch the index.yaml
file from the specified location and use it to load the user-defined presets.
Fetching user defined presets
Once a path has been provided, all presets can be viewed either by navigating to configure/data-sources
in the DASL UI or by using the Python client.
To list presets from a Databricks notebook:
%pip install dasl-client
import dasl_client
# Get the DASL client
client = dasl_client.Client.for_workspace()
# fetch and prin the available
for preset in client.list_presets().items:
print(preset.name)
Using a custom preset
Once the DASL workspace can successfully fetch a user-defined preset, it can be used to create a data source just like any preset provided by Antimatter.
As noted earlier, the name field in preset.yaml
should follow the format <source>_<sourceType>
, such as aws_cloudtrail
or okta_syslog
. For custom presets, this name should also begin with the internal_
prefix. For example, if you create a preset to process Route 53 data from AWS, you might name it:
name: internal_aws_route53
author: ...
description: ...
You would register this preset in your index.yaml
file as:
presets:
- source: aws
sourceType: route53
Only the name
field in preset.yaml
should include the internal_
prefix. The source
and sourceType
values used in index.yaml
and the file paths (e.g., aws/route53/preset.yaml
) remain unchanged.
When referring to a preset programmatically, such as with the Python client or API, you must use the full name value from preset.yaml
, including the internal_
prefix. For example:
client.create_datasource(name="internal_aws_route53", ...)
Caching
To minimize the number of requests made against the customer catalog, presets are cached after a successful lookup. This cache is set to one hour, and thus it may take up to one hour for preset changes to be reflected.