Link Search Menu Expand Document

How it works

CueLake is built upon three core open source products - Zeppelin, Spark SQL, and Iceberg.

You write SQL statements or code as paragraphs in a Zeppelin notebook. When you run a notebook, it’s paragraphs run in a sequence. CueLake provides prebuilt notebook templates for common use cases like incremental refresh.

You add one or more notebooks to a Workflow. When you run a workflow, it’s notebooks run in parallel.

To create a DAG of workflows, you trigger a workflow after another workflow.

To run a workflow or a notebook as per schedule, you assign it a schedule.

CueLake monitors the execution of workflows and notebooks, and maintains logs for each notebook paragraph.

On each trigger of a schedule, assigned notebooks are queued for execution. CueLake then starts the Spark cluster and runs the queued notebooks. Once notebooks are executed, the Spark cluster is automatically shut down.