Infer setup

Overview of dbt-infer

Maintained by: Infer
Authors: Erik Mathiesen-Dreyfus, Ryan Garland
GitHub repo: inferlabs/dbt-infer
PyPI package: dbt-infer
Slack channel: n/a
Supported dbt Core version: v1.2.0 and newer
dbt Cloud support: Not Supported
Minimum data platform version: n/a

Installing dbt-infer

pip is the easiest way to install the adapter:

python -m pip install dbt-infer

Installing dbt-infer will also install dbt-core and any other dependencies.

Configuring dbt-infer

For further info, refer to the GitHub repository: inferlabs/dbt-infer

Connecting to Infer with dbt-infer

Infer allows you to perform advanced ML Analytics within SQL as if native to your data warehouse. To do this Infer uses a variant called SQL-inf, which defines as set of primitive ML commands from which you can build advanced analysis for any business use case. Read more about SQL-inf and Infer in the Infer documentation.

The dbt-infer package allow you to use SQL-inf easily within your DBT models. You can read more about the dbt-infer package itself and how it connecst to Infer in the dbt-infer documentation.

Before using SQL-inf in your DBT models you need to setup an Infer account and generate an API-key for the connection. You can read how to do that in the Getting Started Guide.

The profile configuration in profiles.yml for dbt-infer should look something like this:

~/.dbt/profiles.yml

<profile-name>:
  target: <target-name>
  outputs:
    <target-name>:
      type: infer
      url: "<infer-api-endpoint>"
      username: "<infer-api-username>"
      apikey: "<infer-apikey>"
      data_config:
        [configuration for your underlying data warehouse]  

Note that you need to also have installed the adapter package for your underlying data warehouse. For example, if your data warehouse is BigQuery then you need to also have installed the appropriate dbt-bigquery package. The configuration of this goes into the data_config field.

Description of Infer Profile Fields

Field	Required	Description
`type`	Yes	Must be set to `infer`. This must be included either in `profiles.yml` or in the `dbt_project.yml` file.
`url`	Yes	The host name of the Infer server to connect to. Typically this is `https://app.getinfer.io`.
`username`	Yes	Your Infer username - the one you use to login.
`apikey`	Yes	Your Infer api key.
`data_config`	Yes	The configuration for your underlying data warehouse. The format of this follows the format of the configuration for your data warehouse adapter.

Example of Infer configuration

To illustrate the above descriptions, here is an example of what a dbt-infer configuration might look like. In this case the underlying data warehouse is BigQuery, which we configure the adapter for inside the data_config field.

infer_bigquery:
  apikey: 1234567890abcdef
  username: my_name@example.com
  url: https://app.getinfer.io
  type: infer
  data_config:
    dataset: my_dataset
    job_execution_timeout_seconds: 300
    job_retries: 1
    keyfile: bq-user-creds.json
    location: EU
    method: service-account
    priority: interactive
    project: my-bigquery-project
    threads: 1
    type: bigquery

Usage

You do not need to change anything in your existing DBT models when switching to use SQL-inf – they will all work the same as before – but you now have the ability to use SQL-inf commands as native SQL functions.

Infer supports a number of SQL-inf commands, including PREDICT, EXPLAIN, CLUSTER, SIMILAR_TO, TOPICS, SENTIMENT. You can read more about SQL-inf and the commands it supports in the SQL-inf Reference Guide.

To get you started we will give a brief example here of what such a model might look like. You can find other more complex examples on the dbt-infer examples page.

In our simple example, we will show how to use a previous model 'user_features' to predict churn by predicting the column has_churned.

predict_user_churn.sql
{{
  config(
    materialized = "table"
  )
}}

with predict_user_churn_input as (
    select * from {{ ref('user_features') }}
)

SELECT * FROM predict_user_churn_input PREDICT(has_churned, ignore=user_id)

Not that we ignore user_id from the prediction. This is because we think that the user_id might, and should, not influence our prediction of churn, so we remove it. We also use the convention of pulling together the inputs for our prediction in a CTE, named predict_user_churn_input.

Overview of dbt-infer

Installing dbt-infer

Configuring dbt-infer

Connecting to Infer with dbt-infer​

Description of Infer Profile Fields​

Example of Infer configuration​

Usage​

Connecting to Infer with dbt-infer

Description of Infer Profile Fields

Example of Infer configuration

Usage