Link Search Menu Expand Document

Our Viewpoint

Data engineering should be simple - less code, more SQL

Analysts, being closer to business teams, have a better understanding of what data business users want. If data ingestion and transformation is primarily SQL-based, analysts would be less dependent on developers for moving data. This increases the pool of people within an organization who can manage data.

Data engineering should be flexible

A flexible data engineering process enables iterative development of data pipelines.

Imagine a car without a brake. Will you drive such a car at high speed? No. Brake enables us to drive fast, and gives us the confidence that we can slow down or halt when required.

Similarly, inflexible data pipelines force data teams to guess, discuss and debate what data business teams might want. Firstly, this delays the actual building of data pipelines. Secondly, these data pipelines most often move the entire data, since the focus is on future-proofing the data pipelines.

Data engineering should be automated

Firstly, one-size-fits-all data engineering is a myth. Secondly, we cannot automate a generic data engineering process that tries to move all types of data from all types of sources to all types of destinations. To enable automation, we must reduce the number of choices.

Data engineering should be cost-effective

Cloud has enabled the separation of compute and storage. Storage in cloud object stores like S3 is cheap, but compute is not. We must allocate compute only when required, thereby reducing the cloud infrastructure cost.