Iterable · Airflow · BigQuery · Real-Time ETL

Real-Time Data Pipeline for Iterable API — Real-Time Email Campaigns Analytics Pipeline

A production-grade, real-time ETL pipeline that processes millions of daily records from Iterable API — email sends, opens, clicks, and unsubscribes — orchestrated with Apache Airflow, stored in BigQuery, and visualised in Looker Studio dashboards.

M+
Records Daily
Millions processed
5hr
Incremental Loads
Airflow DAG schedule
Live
BigQuery Tables
Real-time data
ROI
Campaign Analytics
Looker dashboards

What We Built & Why

Email campaigns generate millions of events daily — every send, open, click, bounce, and unsubscribe. Without a centralised pipeline, this data sits fragmented inside Iterable with no way to run aggregated analysis, track campaign ROI, or measure user engagement at scale.

We designed and deployed a production-grade ETL pipeline using Apache Airflow (Dockerised) for orchestration, Python for Iterable API extraction with pagination and rate-limit handling, and BigQuery for both raw storage and transformed fact tables — refreshed every 5 hours, incrementally, processing millions of records efficiently.

Core problem solved: Millions of daily Iterable events trapped in a SaaS tool — unqueryable, unjoined, and unreported. This pipeline extracts every event type incrementally, transforms them into fact tables, and surfaces open rates, click rates, campaign performance, and churn signals in real-time Looker dashboards.

Airflow Orchestration

DAGs run every 5 hours — automatically extracting, transforming, and loading fresh data.

Incremental Loading

Only new records fetched per run — using last_updated timestamps for efficiency.

BigQuery Fact Tables

Campaign, performance, and engagement fact tables — partitioned for speed.

Looker Dashboards

Open rates, click rates, campaign performance — all live, auto-refreshed every 5 hours.

Pipeline Running in Production

Apache Airflow DAGs dashboard — iterable_etl_pipeline running on 0 */5 * * * schedule
Airflow DAG — iterable_etl_pipeline

The live Airflow dashboard shows iterable_etl_pipeline running on a 0 */5 * * * schedule — every 5 hours. Tagged with analytics, bigquery, email, iterable.

· DAG: iterable_etl_pipeline — 1 active run
· Schedule: 0 */5 * * * — every 5 hours
· Last run: 2026-01-15 05:00:00 UTC
· Owner: data-engineering team
· Airflow v2.8.1 — Dockerised environment
BigQuery — Iterable Data Tables Live

The Iterable_data dataset in BigQuery contains all extracted event tables. The email_bounce table shown has 1,318,392 rows — with email, userId, createdAt, messageId, campaignId, and templateId columns.

· Tables: email_bounce, email_click, email_complaint
· email_open, email_open_new, email_send
· email_send_skip, email_unsubscribe
· iterable_master_table, campaign_metrics, ga4_iterable
· 1,318,392 rows in email_bounce alone — millions total
BigQuery Iterable_data dataset showing email_bounce table with 1.3M rows

The Real-Time ETL Pipeline Flow

Iterable API → Python extraction → Airflow orchestration → BigQuery raw tables → SQL transformation → Fact tables → Looker Studio dashboards. Every 5 hours, automatically.

01
Airflow DAG Trigger

DAG triggers every 5 hours — iterable_etl_pipeline starts the extraction task sequence with incremental timestamp filtering.

02
Iterable API Extraction

PythonOperator calls Iterable API with 5-hour date windows — paginating through millions of email send, open, click, and unsubscribe records.

03
Incremental Filter

Uses last_updated timestamp to filter only new records since last run — preventing duplicate loads and reducing API call volume.

04
Python Transformation

Data cleaned, typed, and normalised in Python before loading — handling nulls, date formatting, and field standardisation across all event types.

05
Raw Table Load

BigQueryOperator loads raw records into email_send_raw, email_open_raw, email_click_raw, unsubscribe_raw tables.

06
SQL Aggregation

BigQuery SQL calculates open rates, click rates, unsubscribe rates per campaign using window functions and MERGE for incremental updates.

07
Fact Tables Built

Transformed data loaded into campaign_fact_table, email_performance_fact, and user_engagement_fact tables.

08
Looker Dashboards

Looker Studio connects to BigQuery — campaign performance, user engagement, and email funnel dashboards auto-refresh every 5 hours.

Five Phases of Delivery

Phase 1 — Airflow Setup
· Apache Airflow installed via Docker
· DAG configured — runs every 5 hours
· PythonOperator + BigQueryOperator tasks
· Incremental extraction logic built into DAG
· Rate limiting & retry logic configured
Phase 2 — API Extraction
· Iterable API pagination handled
· 4 event types: send, open, click, unsub
· 5-hour date window per extraction run
· last_updated timestamp for filtering
· API rate limiting handled gracefully
Phase 3 — Transformation
· Raw tables: send, open, click, bounce
· Open rate, click rate, unsub rate SQL
· Campaign aggregation by campaign_id
· Window functions for daily metrics
· Nulls handled, data consistency enforced
Phase 4 — BQ Loading
· campaign_fact_table — per campaign
· email_performance_fact — metrics
· user_engagement_fact — per user
· MERGE SQL for incremental append
· Partitioned by campaign date / timestamp
Phase 5 — Dashboards
· Looker Studio → BigQuery connected
· Campaign performance dashboard
· User engagement & funnel report
· Churn & unsubscribe analysis
· Auto-refreshed every 5 hours
Docker Infrastructure
· Airflow fully containerised in Docker
· Scalable — add workers as volume grows
· Portable — deploy to any cloud or on-prem
· Cloud Functions option for real-time events
· Environment isolation — no dependency conflicts

What the Dashboards Track

Email Performance Metrics

Open rate, click rate, and unsubscribe rate per campaign — with average engagement metrics and trend analysis over time.

Campaign Performance

Total sends, opens, and clicks per campaign. Conversion rates from emails — identifying which campaigns drive the most sign-ups or revenue.

User Engagement

Per-user open count, click count, and unsubscribe count — identify high-engagement users and target them with personalised campaigns.

Funnel Analysis

User journey from email receipt → open → click → conversion. Drop-off points identified at each funnel stage for optimisation.

Churn & Unsubscribes

Unsubscribe count per campaign, churn rate by email category — signals for list hygiene and campaign fatigue detection.

Real-Time Reporting

Dashboards refresh every 5 hours in sync with pipeline runs — marketing teams always see the latest campaign performance without waiting.

What This Pipeline Delivers

Optimised campaign performance — real-time open, click, and unsubscribe rates allow marketing teams to adjust campaign strategies within hours, not weeks.

Personalised engagement — user-level tracking identifies high-value users for targeted campaigns, improving retention and conversion rates significantly.

Incremental efficiency — only new records processed per run, keeping the pipeline fast and cost-effective even as data volume scales to millions of daily events.

Scalable architecture — Airflow + Docker + BigQuery handles increasing data volumes without infrastructure changes as the email programme grows.

Data-driven decisions — automated dashboards give leadership live visibility into campaign ROI, eliminating manual reporting and enabling faster iteration cycles.

Technologies Used

Apache Airflow
DAG orchestration
Iterable API
Email event source
Python
API calls & transforms
BigQuery
Data warehouse
Looker Studio
BI dashboards
Docker
Containerised Airflow

What Was Built & Applied

Apache Airflow
Python
Google BigQuery
Iterable API Integration
Incremental ETL Loading
Looker Studio
Docker Containerisation
Real-Time ETL Architecture

Want a real-time email analytics pipeline for your platform?

From Iterable API extraction to BigQuery fact tables and Looker dashboards — we build ETL pipelines that process millions of records and surface actionable campaign insights.

Start Your Project →