Insights

Building a robust data stack for scalable analytics

HIVED was built on a clear conviction: to disrupt a low-tech parcel delivery industry plagued by inefficiency and poor customer experience, we must leverage data for every layer of our operation. This is a window into how we designed our data analytics stack and the philosophy behind it.

HIVED’s data stack is built on a clear conviction: in order to disrupt a low-tech parcel delivery industry plagued by inefficiency and poor customer experience, we have to leverage data for every layer of our operation. From routing to customer comms, or forecasting to fleet ops, we’re building a data platform that’s modern, composable, and deeply integrated with our product and operations. Here is the window into how we designed our data analytics stack and the philosophy behind it.

Our stack emerged through trial, iteration, and a series of insightful conversations early on with data leaders who’ve built exceptional teams and cultures in adjacent industries. Special thanks to Christopher Brandenburg, whose advice has particularly impacted our early data strategy, and to many others who generously shared their wisdom.

Our analytics platform is powered by a modern, scalable data stack (Dagster, Airbyte, dbt, BigQuery, and Looker) designed to help us ingest, transform, and visualise data with precision. In this post, we’ll unpack how these tools work together to enable analytics at HIVED.

An overview of our data analytics platform.

Dagster: The orchestrator of our data pipelines

At the heart of our data pipeline is Dagster, a powerful orchestration platform built to manage complex data workflows. It lets us define, schedule, and monitor jobs across the stack, with a core strength in abstracting workflows into a DAG of data assets and their dependencies. And it's actually developer-friendly!

Dagster integrates seamlessly with the rest of our stack, orchestrating the full ETL process and ensuring every job runs in the right order. Its monitoring features make it easy to detect and resolve issues quickly, minimising downtime and keeping the pipeline healthy.

We currently run a self-hosted instance of Dagster open-source—stay tuned if you want to know more about our underlying infrastructure!

Airbyte: Seamless data ingestion

To feed our data pipeline, we rely on Airbyte, an open-source platform designed for data ingestion. Airbyte makes it easy for us to pull data from a variety of sources, whether that’s APIs, databases, or file systems, and load it into our data warehouse.

Airbyte’s real strength lies in its flexibility. With a large library of built-in connectors, it’s straightforward to ingest data from tools like Zendesk and Postgres. And when we need something custom, we can build or adapt connectors to fit our needs.

Dagster’s native integration with Airbyte means we can treat ingestions as first-class data assets and link them to downstream dependencies. Seeing the full lineage, including ingestion, all in one place is the real magic of this set up.

Same as with Dagster, we run a self-hosted instance of Airbyte open-source.

An example of a pre-built connector in Airbyte.

dbt: Transforming our data

Once the raw data is ingested into our data warehouse, we turn to dbt (Data Build Tool) for transformation. dbt enables us to write and run SQL-based transformation models that clean, enrich, and structure our data for analysis. With dbt, we are able to define our transformations as version-controlled, reusable models, making it easy to collaborate and maintain our data pipeline over time.

dbt is great for testing and documentation. We write tests for every model to make sure our data hits the right quality standards. Plus, dbt automatically creates docs for each model, so it's super easy to see how the data is structured and where it comes from.

Dagster also has a native integration with dbt, which allows us to use dbt-core and orchestrate our jobs directly in Dagster. We can also visualise all dbt models directly in the Dagster UI, including dependencies with assets outside of dbt.

Dagster’s native integrations allow tracking asset lineage across tools, e.g. from Airbyte to dbt models.

Elementary: Anomaly detection with dbt

To level up our data quality checks, we use Elementary. It helps us automate anomaly detection tests, running alongside dbt’s built-in tests to spot issues like missing values or unexpected changes. This gives us real-time alerts on data problems, so we can fix them quickly before they impact our analytics. It’s a game-changer for maintaining clean, reliable data.

Elementary’s proactively detects and flags data anomalies beyond traditional column-level checks.

BigQuery: Our data warehouse

We use BigQuery as our data warehouse, Google Cloud’s fully managed, serverless platform that’s built for scale and speed. It’s ideal for storing and querying large datasets with minimal overhead.

BigQuery makes it easy to handle large datasets and run complex queries at scale, without having to worry about infrastructure. As a fully serverless platform, it automatically scales with our workload and abstracts away server management entirely. We love that it’s usage-based, so we only pay for what we store and process.

Looker: Visualising and interpreting our data

At the top of our stack, Looker helps us explore and visualise data, turning raw outputs into shareable, actionable insights for teams across HIVED.

A standout feature is how it connects directly to BigQuery, letting us query live data and visualise it in a self-service manner. We can create always-up-to-date dashboards, making sure everyone has the insights they need to make smart decisions. Plus, Looker’s modelling layer ensures consistency in how metrics are defined and interpreted across the team.

In addition to Looker, we also use Looker Studio to share data-driven reports with external stakeholders or non-technical team members. It’s a powerful way to make sure everyone, no matter their role, can access the data they need.

Together, Looker and Looker Studio bring our data to life, allowing us to drive better decisions across the board.

An example of Looker’s powerful data visualisations.

That wraps up our data stack here at HIVED! By combining Dagster, Airbyte, dbt, BigQuery, Elementary and Looker, we’ve built a robust, scalable, and flexible system for managing, transforming, and visualising data. With everything working in harmony, we can focus on what matters most: using data to drive insights and decisions that power our business forward.

If you are interested in learning more about how HIVED’s analytics are powering us forward, look out for upcoming blogs or follow along with HIVED on our Linkedin or on Instagram and TikTok @hivedhq.

We’re hiring 🚀 You can check out our open roles here.

GET IN TOUCH

See how HIVED works

Contact us to learn what shipping with HIVED might look like for your business.

Track my parcel instead
Track my parcel instead
Contact us
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.