ΣData lab · only what we do with your data

Turn your data into decisions you can trust.

We dig into your numbers, build dashboards your team will actually use, and add machine learning where it pays off. No guesswork. No vanity charts.

1.4Mrows / client
98%model accuracy
60×report speedup
94% CONFIDENCEForecast band
analysis.dwt · revenue_forecast.ipynb running · live
▲ revenue · $USDoutlier · +2.4σtime · 180d →
outlier · model: -32% likelihood
0.94 n = 1,247 p < 0.001 MAPE 3.2%
LINEAR POLY-3 ARIMA PROPHET
Record50+ · dashboards shipped
Volume100M+ · rows / day
GovernanceHIPAA · GDPR · SOC-2
the three services

Three things, done properly.

One pod handles all three — we don't do web builds, mobile apps or chatbots here. Zero context-switching, zero handoffs.

01

Data Analysis

Exploratory, statistical and causal analysis that turns noisy events into explanations the leadership team can defend in a board meeting.

EDAcohortfunnela/b testscausal
02

Data Visualization

Executive dashboards, interactive reports and customer-facing embedded BI — designed so the right chart appears at the right decision moment.

Power BITableauLookerD3embedded
03

Data Science

Production-grade ML — forecasts, churn prediction, anomaly detection, clustering, recommendation. Deployed, monitored, retrained on schedule.

PythonPyTorchXGBoostMLflowsklearn
deep-dive · analysis

Find the why, not just the what.

Descriptive stats are table-stakes — any BI tool gives you a number. What's rarer is an answer to why it moved. We run the statistical chops — hypothesis tests, cohort decomposition, funnel attribution, causal inference — that turn "revenue is down 12%" into a defensible, actionable story.

  • Exploratory (EDA) — distributions, correlations, outliers
  • Cohort & retention — survival curves, N-day returns
  • Funnel & attribution — multi-touch, Markov, Shapley
  • A/B tests & causal — DiD, synthetic control, uplift
  • Segmentation — k-means, RFM, behavioural clustering
  • Statistical tests — t / χ² / Mann-Whitney · Bonferroni
PythonpandasstatsmodelsSciPyJupyterdbtSQL
raw_events.csv · 1,247 rowsdtype: mixed
user_idcohortrevenueretainedltv_pred
EDA → INSIGHT
Ltv gap · cohort A/B
0%Variance explained
0p-value · one-tail
deep-dive · visualization

Charts people actually open.

The best dashboard doesn't show every metric — it shows the one the viewer needs to act on, right now. We design for the decision: exec, analyst, sales, customer-facing. Different roles, different charts, different cognitive loads.

  • Executive dashboards — 1-screen overview of KPIs + alerts
  • Operational BI — drill-downs, filters, ad-hoc analyst tools
  • Customer-facing — white-label embedded BI in your SaaS
  • Reports & emails — scheduled PDFs, Slack digests
  • Interactive stories — D3 + Observable, narrative-driven
  • Real-time monitoring — alert on anomalies in < 5s
Power BITableauLookerMetabaseSupersetD3.jsPlotlyObservable
deep-dive · data science

Models that ship to production.

We train, validate and deploy supervised and unsupervised models for real business problems — not Kaggle trophies. Every model ships with an eval harness, drift monitor and retraining schedule. No "demo notebook that lives on a laptop".

  • Forecasting — Prophet, ARIMA, GBM, neural
  • Churn / LTV — logistic, survival, GBM
  • Recommendation — collaborative, content, hybrid
  • Anomaly detection — isolation forest, VAE, prophet
  • Clustering — k-means, DBSCAN, hierarchical
  • NLP / text — classification, entity extraction, topic
PythonPyTorchscikit-learnXGBoostLightGBMMLflowWeights & BiasesRay
churn_v3.py · training run epoch / 40
training_accval_loss
98.2%Accuracy
97.4%Precision
96.8%Recall
97.1%F1 score
workflow

Raw data → shipped insight, in four acts.

Each act has clear inputs and outputs — you always know what goes in, what comes out, and how long it takes.

4 actsDiscover → Operate
6–10 wksTypical duration
WeeklyStakeholder demos
SharedSlack · Linear · Loom
01 AUDIT

Audit & explore

Connect sources, profile quality, shape the KPI tree. Find where the signal is.

IN
raw sources team access goals
OUT
EDA notebook KPI tree 90-day plan
1–2 weeks
02 MODEL

Model & validate

Transform layer, statistical tests, candidate models or wireframe dashboards. Validate with stakeholders.

IN
EDA notebook KPI tree
OUT
dbt models eval harness wireframes
2–3 weeks
03 SHIP

Ship & instrument

Production pipelines, RBAC, lineage. Dashboards published or model deployed — with drift monitors.

IN
dbt models wireframes
OUT
prod pipelines live dashboard drift monitor
3–4 weeks
04 OPERATE

Operate & iterate

Retrain on schedule, extend dashboards, respond to anomalies. Quarterly exec review.

IN
live traffic drift alerts
OUT
retrained models QBR new features
ongoing
Currently running: Audit & explore
25%
case studies shipped

Real data work, measured impact.

Three production data engagements from the last twelve months. Real clients, real metrics, real dashboards.

SaaS · Analytics

SPRNV — unified attribution

Twelve ad networks, one dashboard, predictive LTV. Reporting dropped from 4 hours to 4 minutes.

60×faster
+34%ROAS
12networks
E-commerce · BI

Hoist — growth intelligence

Unified Shopify, Meta, Google & Klaviyo into a real-time BI layer with cohort retention. Doubled revenue in 9 months.

+187%revenue
-22%CAC
5sources
Healthcare · ML

Mariposa — clinical risk scoring

EHR + lab + imaging unified across 1.4M patient records. Real-time risk-scoring dashboard with nightly retraining.

1.4Mrecords
-62%triage time
98.2%accuracy
techniques & stack

The toolkit we actually use.

We pick pragmatically based on your team's skills, data volume & stack — not by what's trending on Hacker News.

SQL + dbt

Modelled warehouse layer, tested, documented, version-controlled.

Power BI / Tableau

Executive dashboards with row-level security and drill-downs.

Python · pandas

EDA, statistical tests, pipelines, ML prototypes.

Snowflake / BigQuery

Cloud warehouses sized to your scale and team maturity.

Airflow / dbt Cloud

Scheduled pipelines with retries, alerts and lineage.

scikit-learn · XGBoost

The ~90% of ML problems that don't need a neural net.

PyTorch · Lightning

Deep models when patterns deserve it — with MLflow tracking.

D3 · Plotly · Observable

Custom interactive viz for when stock charts won't do it.

voices

Data leaders, on working with us.

Their EDA caught three data-quality bugs our internal team had missed for a year. We rebuilt the whole cohort analysis and re-ran a full quarter's forecast in ten days.
JL
Jim LaitzeFounder · Mariposa
We went from screenshots of Google Analytics in the monthly board deck to a real-time dashboard my investors use daily. Different company.
DL
Dr. Dana LoveCTO · Lifetoken
data questions

Frequently asked data questions.

If you have a question we haven't answered, ask us directly.

Do I need a warehouse before we start?

No — we often stand up Snowflake or BigQuery as part of the engagement, or we work with an existing Postgres / spreadsheet source if that's where your data lives today.

Can you just do a dashboard, or is it an "all-or-nothing" engagement?

Single-dashboard engagements are a great way to start — typically 2–3 weeks. Many clients expand into a full modern-data-stack build after seeing the first ROI.

Which BI tool will you recommend?

Depends on your audience. Power BI for Microsoft shops, Tableau for analyst-heavy orgs, Looker for API-driven embedding, Metabase / Superset for open-source preference. We'll recommend in the first call.

How do you handle PII & compliance?

Column-level lineage via dbt, PII tagging and automatic redaction at ingestion, RBAC at the warehouse layer, audit logs. HIPAA, GDPR, SOC-2 patterns all in the playbook.

Can you deploy and monitor ML models, not just train them?

Yes — every model ships with an MLflow registry entry, an eval suite, drift monitoring on key features, and a retraining schedule. No "demo notebook that lives on a laptop."

Do you run analyses (vs. build tooling)?

Yes — we run one-off analyses (pricing studies, churn deep-dives, attribution modelling) when that's what's needed. Results shipped as a notebook + written report, not just code.

How do you price data engagements?

Data Opportunity Sprint (audit + plan): free. Single dashboard: from $12K. Full modern data stack: $40–80K. Retained analytics-engineering pod: from $16K/month.

Who owns the code and models?

You do — GitHub repo, warehouse, dbt project, notebooks and trained models all transfer to you on project close. No vendor lock-in.

Ready for data that ships decisions?

# 30-minute Data Opportunity call. We audit your stack, identify 3 high-leverage opportunities, sketch a 90-day plan, and send you a straight-talk estimate — free.