CueLake has the following in-built templates.
Incremental Refresh template fetches incremental data from your source database and upserts it into the destination table in S3. It creates the destination table, if it doesn’t exist.
Say you want to sync ORDERS table to S3. Instead of syncing all the columns in the table, you want to sync a few columns only. Below is a sample query.
Select CREATEDTS, MODIFIEDTS, PK, ORDERSTATUS, ORDERAMOUNT from ORDERS
Whenever a row is updated in the ORDERS table in the source database,
MODIFIEDTS is also updated.
PK is the primary key of ORDERS table.
When you run this notebook for the first time, CueLake will fetch the entire data and create the S3 table. On the next run, only the incremental rows will be fetched and upserted into the S3 table. CueLake uses Iceberg’s
merge into query to merge the data.
On each run of an incremental refresh notebook, it’s paragraphs execute the following tasks:
- Execute Spark SQL query to fetch incremental data from the database.
- Sort the incremental data.
- Merge the incremental data into the Iceberg table.
- Run maintenance on Iceberg table
Full Refresh template fetches all the data from your source database and overwrites the destination table in S3. It creates the destination table, if it doesn’t exist.