Skip to main content

Infer setup

Overview of dbt-infer

  • Maintained by: Infer
  • Authors: Erik Mathiesen-Dreyfus, Ryan Garland
  • GitHub repo: inferlabs/dbt-infer
  • PyPI package: dbt-infer
  • Slack channel: n/a
  • Supported dbt Core version: v1.2.0 and newer
  • dbt Cloud support: Not Supported
  • Minimum data platform version: n/a

Installing dbt-infer

pip is the easiest way to install the adapter:

python -m pip install dbt-infer

Installing dbt-infer will also install dbt-core and any other dependencies.

Configuring dbt-infer

For further info, refer to the GitHub repository: inferlabs/dbt-infer

Connecting to Infer with dbt-infer

Infer allows you to perform advanced ML Analytics within SQL as if native to your data warehouse. To do this Infer uses a variant called SQL-inf, which defines as set of primitive ML commands from which you can build advanced analysis for any business use case. Read more about SQL-inf and Infer in the Infer documentation.

The dbt-infer package allow you to use SQL-inf easily within your DBT models. You can read more about the dbt-infer package itself and how it connecst to Infer in the dbt-infer documentation.

Before using SQL-inf in your DBT models you need to setup an Infer account and generate an API-key for the connection. You can read how to do that in the Getting Started Guide.

The profile configuration in profiles.yml for dbt-infer should look something like this:

~/.dbt/profiles.yml
<profile-name>:
target: <target-name>
outputs:
<target-name>:
type: infer
url: "<infer-api-endpoint>"
username: "<infer-api-username>"
apikey: "<infer-apikey>"
data_config:
[configuration for your underlying data warehouse]

Note that you need to also have installed the adapter package for your underlying data warehouse. For example, if your data warehouse is BigQuery then you need to also have installed the appropriate dbt-bigquery package. The configuration of this goes into the data_config field.

Description of Infer Profile Fields

FieldRequiredDescription
typeYesMust be set to infer. This must be included either in profiles.yml or in the dbt_project.yml file.
urlYesThe host name of the Infer server to connect to. Typically this is https://app.getinfer.io.
usernameYesYour Infer username - the one you use to login.
apikeyYesYour Infer api key.
data_configYesThe configuration for your underlying data warehouse. The format of this follows the format of the configuration for your data warehouse adapter.

Example of Infer configuration

To illustrate the above descriptions, here is an example of what a dbt-infer configuration might look like. In this case the underlying data warehouse is BigQuery, which we configure the adapter for inside the data_config field.

infer_bigquery:
apikey: 1234567890abcdef
username: my_name@example.com
url: https://app.getinfer.io
type: infer
data_config:
dataset: my_dataset
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: bq-user-creds.json
location: EU
method: service-account
priority: interactive
project: my-bigquery-project
threads: 1
type: bigquery

Usage

You do not need to change anything in your existing DBT models when switching to use SQL-inf they will all work the same as before but you now have the ability to use SQL-inf commands as native SQL functions.

Infer supports a number of SQL-inf commands, including PREDICT, EXPLAIN, CLUSTER, SIMILAR_TO, TOPICS, SENTIMENT. You can read more about SQL-inf and the commands it supports in the SQL-inf Reference Guide.

To get you started we will give a brief example here of what such a model might look like. You can find other more complex examples on the dbt-infer examples page.

In our simple example, we will show how to use a previous model 'user_features' to predict churn by predicting the column has_churned.

predict_user_churn.sql
{{
config(
materialized = "table"
)
}}

with predict_user_churn_input as (
select * from {{ ref('user_features') }}
)

SELECT * FROM predict_user_churn_input PREDICT(has_churned, ignore=user_id)

Not that we ignore user_id from the prediction. This is because we think that the user_id might, and should, not influence our prediction of churn, so we remove it. We also use the convention of pulling together the inputs for our prediction in a CTE, named predict_user_churn_input.

0