Study notes · Data Modeling

Banking Data Models — 12 Approaches, 4 Eras

I've been reading my way through how banking data has been modeled — from Inmon's 1992 enterprise data warehouse to 2024's source-centric lakehouse with semantic Gold. This post is my notes: 12 approaches, with diagrams, trade-offs, and a banking-domain example for each.

May 2026 · ~35 min read · learning in public

Heads up — this is a learning exercise, not a war story. I haven't shipped all 12 of these. I worked through papers, vendor docs, and case studies, and these are the notes I wish I had when I started reading.

The map: how data has been modeled in banking from 1992 to 2026. Twelve approaches, four eras. Each one gets origin, core idea, a diagram, pros and cons, and a note on what it's still good for today.

The four eras: Classical (top-down EDW + dimensional + canonical CDM), Reference Standards (BIAN + ISO 20022 + FIBO), Modern Lakehouse (Vault + Medallion + Mesh), and AI-Era (Activity Schema + Source-centric + Hybrid). Each era solved a different set of constraints; matching your constraints to one is the only way to pick well.

At the end there's a comparison matrix scoring all 12 on dev velocity, AI readiness, maintainability, extensibility, regulatory fit, and adoption — plus an open-ended closing on what I'm still trying to figure out.

Era 1 · 1990–2010 · Classical

The Era of "Build a CDM"

Storage was expensive, compute happened on a single machine, integration ran on ESBs with rigid contracts, and schema evolution was a project not a PR. The dominant question: "where do we put the canonical?". Three answers emerged.

Approach 01 · Classical

Inmon's CIF — Top-Down 3NF Enterprise Data Warehouse

Inventor: Bill InmonEra: 1992Where: Most banks 1995–2010

Philosophy

Single version of truth. The warehouse is the enterprise's institutional memory. Build it complete and pure first; consumers come second. Integrity over agility.

Rationale

When storage was $10K/TB and integration ran on ESBs, every byte was costly. Building one fully-normalized 3NF EDW once meant paying the integration cost once. Marts derive from a stable core — no semantic drift across reports.

Inmon's Corporate Information Factory was the foundational pattern of enterprise data warehousing. The proposition: build one corporate-wide third-normal-form (3NF) data warehouse first — every entity, every attribute, every relationship — and then create dependent data marts (often dimensional) on top for specific reporting needs.

CORE IDEABuild the entire enterprise schema in 3NF before you build any data mart. The EDW is the single source of truth; marts are downstream views.

✓ Strengths

Single source of truth, integrity guaranteed by 3NF
Marts derive consistently — no semantic drift
Audit + lineage natural (one canonical place)
Regulatory reporting (BCBS 239) lineage clean

✗ Weaknesses

Heavy upfront — 12-24 months before first business value
3NF is consumer-unfriendly — every query needs joins
Schema changes ripple through marts
Doesn't fit "ship a use-case in a sprint" expectations

Banking exampleMost large banks 1995–2010 built CIF-style EDWs on Teradata or Oracle (Citi, HSBC, Wells Fargo are commonly cited). Local-market banks that started warehousing 2008–2015 typically followed the same template with localization for regulatory reporting.

When to use todayGreenfield: rarely. The upfront cost no longer matches expectations. Migration: still relevant when the existing EDW is 3NF and you're modernizing layer-by-layer. Conceptual: the discipline (single-source-of-truth) is alive in MDM hubs and contracts.

Approach 02 · Classical

Kimball's Dimensional — Bottom-Up Star Schemas with Conformed Dimensions

Inventor: Ralph KimballEra: 1996Where: Reporting layers in nearly every bank

Philosophy

Business processes are the unit of analysis, not entities. Model what users ask, not what the schema demands. Ship in weeks, conform via shared dimensions. Agility over completeness.

Rationale

Inmon's 18-month upfront EDW was killing projects. Kimball's bottom-up answer — build a fact table for one business process now, share dimensions across processes via discipline — let teams deliver value continuously while still maintaining cross-process consistency.

Kimball's response to Inmon: instead of modeling everything upfront, build one fact table per business process (sales, transactions, applications) surrounded by conformed dimensions (customer, product, date) that are reused across processes. Star schemas optimized for OLAP queries.

CORE IDEAModel business processes one at a time as fact tables. Share dimensions across processes via the "conformed dimension" discipline. Ship the first mart in weeks, not months.

✓ Strengths

Fast delivery — first mart in 4-8 weeks
Query-friendly — joins are simple, BI tools love it
Conformed dimensions enforce reuse
Business users can self-serve once dimensions are clean

✗ Weaknesses

Cross-process queries can be awkward (need conformed dims discipline)
Granularity decisions are hard to reverse
Doesn't preserve full transactional history without bridge tables
Star schemas don't naturally support graph/event queries

Banking examplePower BI / Tableau / SAS / SAP BO report layers in virtually every bank. Even when the silver layer is 3NF or Vault, the consumption layer is usually dimensional — the discipline survives every storage shift.

When to use todayFor BI reporting layer on top of any silver. The dimensional discipline (conformed dim, granularity, slowly-changing) is timeless even though the underlying storage is now lakehouse parquet, not Teradata cubes.

Approach 03 · Classical

IBM BDW / FSLDM — Banking-Specific 3NF Canonical

Inventor: IBM (Banking Data Warehouse), Teradata FSLDM, Oracle FSDMEra: 2000sWhere: Tier-1 banks globally

Philosophy

Banking has been done before. Don't reinvent the entity model — inherit it. Every bank has Parties, Arrangements, Events, Positions. Customize attributes; keep structure. Industry consensus over local invention.

Rationale

Building a banking entity model from scratch took 18-24 months. IBM/Teradata/Oracle pre-built ~10 supertype model based on industry consensus. Banks adopted to skip the modeling phase and benefit from regulator familiarity. The 'banking edition of 3NF' off-the-shelf.

If Inmon's CIF says "build a 3NF EDW", IBM BDW (and its siblings Teradata FSLDM and Oracle FSDM) say "here is the 3NF — banking edition, off-the-shelf". A pre-defined logical data model with ~10 super-types (Party, Arrangement, Account, Event, Position, Product, Asset, Channel, Resource, Location, Classification) covering every banking concept. Decades of industry consensus baked in.

CORE IDEAA pre-built canonical 3NF schema specific to banking. Banks customize attributes but inherit the entity structure. The "shared vocabulary" is encoded in the entity names and relationships.

✓ Strengths

Off-the-shelf banking semantics — no need to invent
Vendor-supported (IBM, Teradata, Oracle ship reference doc)
Regulator-friendly — well-known structure for audit
Cross-product semantics already harmonized

✗ Weaknesses

Heavy — 80-200 entities even at minimum
Consumer-unfriendly — every query joins 5-7 tables
Slow to onboard new sources (must map into canonical)
Schema evolution painful — change ripples across all consumers

Banking exampleHSBC, Citi, Standard Chartered, ING — commonly cited as having built FSLDM-style canonical warehouses 2000–2015. Mid-tier banks in many emerging markets adopted the same templates as their warehouse foundations because vendor support + regulator familiarity removed two big risks at once.

When to use todayGreenfield: rarely. The 18-month canonical-modeling phase no longer matches dev velocity expectations. Migration: still relevant when modernizing an existing FSLDM warehouse — you can keep canonical at silver and add Gold on top (Modern A pattern). Conceptual: the entity vocabulary (Party, Arrangement, Event) is still useful as glossary even if not as physical schema.

Era 2 · 2004–2015 · Reference Standards

The Era of "Standardize the Wire, Not the Warehouse"

Once SOA matured and cross-bank integration grew, the question shifted from "what does our warehouse look like" to "what does the wire between systems look like". Three industry standards emerged — none are warehouse schemas, all are interoperability layers. Banks need to understand them because data lands in the warehouse shaped by these standards.

Approach 04 · Standards

BIAN — Banking Industry Architecture Network · Service Domain Reference

Steward: BIAN consortium (ING, Microsoft, IBM, SAP, Temenos, …)Era: 2008+, current version 13Where: Bank API design, microservices boundaries

Philosophy

A bank is a portfolio of services, not tables. Standardize the service interfaces; let each vendor implement the data however they want. Behavior over storage.

Rationale

Banks integrate dozens of vendor systems. Without a common service vocabulary, every integration is bespoke. BIAN gives a vendor-neutral set of ~300 service domains so a Customer Position service from Temenos talks the same language as one from Finacle — APIs become composable.

BIAN is not a data model — it's a service-oriented reference architecture. It decomposes a bank into ~300 service domains (Customer Reference, Customer Position, Loan, Payment Order, Card Capture, Fraud Resolution, …), each with defined responsibilities, control records, and operations. When two systems integrate, BIAN gives them a shared vocabulary for what services should exist and what data they exchange.

CORE IDEAA bank is a portfolio of ~300 service domains organized into Business Areas. Each service domain owns its data and exposes operations. Common across the industry → vendor-neutral integration.

✓ Strengths

Vendor-neutral — Temenos, FIS, Finacle, Oracle all map to BIAN
Service-oriented — perfect for microservice boundaries
Industry consensus — pre-built taxonomy of bank capabilities
Pairs well with API-first design (REST / GraphQL contracts)

✗ Weaknesses

Not a data model — doesn't tell you how to store data
Heavy taxonomy — 300 domains is too many for small banks
Implementation varies — domain naming is consistent but data format is not
Doesn't replace the warehouse design question

Banking exampleING used BIAN to redesign its IT estate around microservices. Deutsche Bank's API marketplace exposes services aligned to BIAN domains. Many vendors (Temenos, Finacle) advertise "BIAN-aligned" products, meaning their APIs map to BIAN service domains.

When to use todayFor API design and microservice boundaries — BIAN is the modern banking equivalent of "REST best practice". For warehouse modeling — pair it with another approach (FSLDM canonical, source-centric silver, etc.). BIAN tells you what services exist; you still choose how data is stored.

Approach 05 · Standards

ISO 20022 — Universal Message Standard for Payments

Steward: ISO + SWIFTEra: 2004+, full SWIFT migration deadline Nov 2025Where: Cross-border payments, RTGS, real-time payments

Philosophy

The wire is the contract. If two systems speak the same structured message, they don't need to share schemas — just understand the message. Messaging over modeling.

Rationale

SWIFT MT messages were free-text; payment reconciliation wasted billions globally. ISO 20022 brings rich structured XML/JSON — amount, currency, parties machine-parseable. By Nov 2025, every cross-border payment is mandated to use ISO 20022. Not optional.

ISO 20022 is an XML/JSON-based message standard for financial messaging — payments, securities, FX, cards, trade. It defines a common business modeling methodology and a standard library of message types (e.g. pacs.008 = customer credit transfer, camt.053 = bank-to-customer statement, pain.001 = customer-initiated payment). By Nov 2025, all SWIFT cross-border payments must use ISO 20022 — every bank in the world is affected.

CORE IDEAA common semantic dictionary for financial messages. Any bank, any system, any country can produce and consume the same message types with rich structured data instead of free-text fields.

✓ Strengths

Global standard — replaces SWIFT MT, used in SEPA, CHAPS, Fedwire, ISO 20022 RTP
Rich structured data — replaces free-text MT103 fields
Machine-readable XML/JSON
Pre-built domain vocabulary — Party, Account, Amount, Settlement

✗ Weaknesses

Verbose XML — payloads 5-10x SWIFT MT
Not a warehouse schema — must flatten / shred for analytics
Migration cost is real — every payment system, every reconciliation tool
Different message types for different flows — need sub-domain expertise

Banking exampleSWIFT's CBPR+ initiative mandates ISO 20022 for cross-border payments by Nov 2025. Several national real-time payment systems (SEPA, CHAPS, Fedwire RTP, and a number of country-level RTGS) are aligned to ISO 20022. Every bank running cross-border treasury must onboard ISO 20022 by 2025 — no opt-out.

When to use todayMandatory for payment systems by 2025. Use the ISO 20022 message vocabulary as input to your data warehouse (don't store raw XML — shred into structured silver tables). Don't use ISO 20022 schemas as warehouse schemas — too verbose, too message-shaped.

Approach 06 · Standards

FIBO — Financial Industry Business Ontology

Steward: EDM Council, OMG (Object Management Group)Era: 2010+Where: Regulatory data definitions, knowledge graphs

Philosophy

Definitions matter. If 'Loan' means three different things in three systems, you don't have a CDM — you have chaos. Anchor terms in formal ontology with logical reasoners.

Rationale

Regulatory work (BCBS 239, IFRS 9, EBA reporting) requires precise definitions traceable across systems. FIBO provides OWL/RDF ontology where 'Loan' has a single formal definition. Reasoners validate consistency. Knowledge graphs use this as substrate. ECB and FRB align dictionaries to FIBO.

FIBO is an OWL/RDF ontology — formal, machine-readable definitions of financial concepts (Party, Account, Loan, Derivative, Security, Bond, Counterparty…) and their relationships, expressed in description logic. Where FSLDM gives you a relational schema, FIBO gives you a concept graph with formal semantics.

CORE IDEAA formal ontology of finance. Every concept (Loan, Counterparty, Risk Position) has a logical definition; concepts are connected by typed relationships. Machine-readable, reasoner-friendly, regulator-aligned.

✓ Strengths

Formal semantics — disambiguates "what is a Loan"
Machine-readable — reasoners can infer relationships
Regulatory adoption — ECB, FRB use FIBO for definitions
Aligns with knowledge graph + LLM agent retrieval

✗ Weaknesses

Heavy tooling — needs OWL reasoners, RDF stores
SPARQL queries unfamiliar to most data engineers
Performance — graph queries don't scale like SQL on lakehouse
Mostly definition layer — not where you store actual data

Banking exampleEuropean Central Bank uses FIBO for regulatory data dictionaries. Some banks (Wells Fargo, JPMorgan) maintain FIBO-aligned glossaries linked to physical schemas. BCBS 239 implementations frequently reference FIBO for traceable definitions.

When to use todayAs a glossary backbone — define your business terms by reference to FIBO concepts. As a knowledge graph layer — when you want LLM agents to reason over definitions. Don't use FIBO as your physical warehouse schema; use it as the semantic anchor on top of whatever physical model you choose.

Era 3 · 2013–2022 · Modern Lakehouse

The Era of "Cheap Storage Changes Everything"

When S3 cost $0.023/GB/month and Spark/Delta turned the data lake into a queryable warehouse, three new patterns emerged. None of them assume canonical 3NF — they assume cheap storage, parallel compute, and that schema can change with a PR.

Approach 07 · Modern Lakehouse

Data Vault 2.0 — Hub / Link / Satellite

Inventor: Daniel LinstedtEra: 2000s (1.0), 2013 (2.0)Where: European banks, modernization projects

Philosophy

History is sacred. Never overwrite — always append. Sources are independent — never collide. Conformance happens at read time, not write time.

Rationale

Auditable warehouses (BCBS 239, regulated industries) need full history with provenance. Vault's hub/link/satellite pattern guarantees append-only by construction. Sources stay independent (one satellite per source per hub), making schema evolution isolated. Time-travel is built-in, not bolted-on.

Data Vault 2.0 splits warehouse modeling into three table types: Hub (one row per business key — e.g., customer_id), Link (relationships between hubs — e.g., customer ↔ account), and Satellite (attributes and history, append-only). Every change is a new satellite row, never an update. Sources stay separate: sat_customer_t24, sat_customer_crm sit side by side under one hub_customer.

CORE IDEAThree table types, append-only, source-aware. Hubs hold business keys, Links hold relationships, Satellites hold attributes from each source. Audit + parallel loading + late-binding semantics — all built in.

✓ Strengths

Append-only — perfect audit trail for regulators
Source-aware — multiple satellites per hub, sources never collide
Parallel loadable — every hub/link/sat can load independently
Late-binding semantics — conform at PIT view, not at silver

✗ Weaknesses

Consumer queries are very complex — every read needs PIT joins
3-5x more tables than 3NF for the same domain
Tooling immature outside specialist vendors (Wherescape, Vaultspeed)
Steep learning curve — most engineers don't think in HLS

Banking exampleMany European banks (Rabobank, Nordea, ABN AMRO) use Data Vault 2.0 for regulatory warehouses where audit trail is paramount. Several large insurance groups also adopted it. In Vietnam, less common — the learning curve and tooling cost are high.

When to use todayWhen audit / lineage / regulatory traceability is the dominant requirement (BCBS 239, IFRS 9 with full history). When you need parallel ingestion from many sources without schema collisions. Pair with a Gold layer for consumer queries — Vault is not a consumer interface.

Approach 08 · Modern Lakehouse

Lakehouse Medallion — Bronze / Silver / Gold

Steward: Databricks (popularized)Era: 2020+Where: Most modern data platforms

Philosophy

Storage is free, compute is elastic. Layer transformations by maturity, not by entity. Consumer value comes from the top layer; durability from the bottom. Convention over prescription.

Rationale

When parquet on S3 cost $0.023/GB and Spark could rebuild any layer in minutes, the cost calculus inverted. Bronze raw becomes the cheap durable record; Silver is conformance; Gold is consumer-shaped. Medallion is the convention that names this maturity gradient — letting any specific schema (FSLDM / Vault / source-centric) live inside.

The Medallion is a layering convention, not a specific schema. Bronze = raw / 1:1 with sources. Silver = cleaned, conformed, sometimes joined. Gold = business-aggregated, query-ready. The genius of Medallion is that it doesn't prescribe what goes in silver or gold — it just gives you a vocabulary to organize your transformations. You can run FSLDM inside Medallion, or Vault, or source-centric, or all three.

CORE IDEAA layered convention, not a schema. Bronze raw, silver cleaned, gold business-aggregated. Each layer is a step in transformation maturity. The pattern is universal; what fills each layer is a separate choice.

✓ Strengths

Universal vocabulary — every team understands bronze/silver/gold
Storage-cheap — parquet/Delta on S3 fits any scale
Tool-friendly — dbt, Spark, Databricks all native
Combines with any specific approach (FSLDM, Vault, source-centric)

✗ Weaknesses

Doesn't answer "what goes in silver" — still need a model
Without discipline, gold balloons (every team builds their own)
Layer boundaries fuzzy in practice
Marketing-heavy — sometimes treated as a complete answer when it's just a container

Banking exampleMost modern banking lakehouses adopt Medallion: Capital One on Snowflake, JPMorgan on Databricks, and Goldman Sachs' internal lakehouse have all been publicly discussed. Banks setting up new Enterprise Data Platforms from 2023 onward almost uniformly use Medallion as the layering convention.

When to use todayAlways — as a layering convention. But pair it with a specific schema choice (FSLDM canonical for migration, source-centric for greenfield, Vault for audit-heavy). Medallion alone is not a complete data model.

Approach 09 · Modern Lakehouse

Data Mesh — Domain-Owned Data Products

Inventor: Zhamak DehghaniEra: 2019+Where: Large orgs with strong domain teams

Philosophy

Data is owned by domains, not the data team. The platform team builds infrastructure; domains build products. Federated governance replaces central modeling. Decentralization over consolidation.

Rationale

At scale (50+ teams), a central data team becomes a bottleneck — every new use case waits in their queue. Domain teams have the deepest knowledge of their data anyway. Mesh inverts the org: each domain treats its data as a product with contracts, SLAs, ownership. Central team enables; domains deliver.

Data Mesh is an organizational pattern: instead of a central data team owning all data, each domain (Customer, Loan, Card, Payment, Risk) owns its own data products. Each domain publishes data with contracts, SLAs, ownership. A central "platform" team provides self-serve infrastructure (lakehouse, catalog, contracts framework). Federated computational governance enforces interop without central modeling.

CORE IDEATreat data as a product. Each business domain owns its data products end-to-end. Central platform team provides infrastructure. No central canonical model — federated governance + contracts replace it.

✓ Strengths

Scales to large orgs (50+ domain teams) without central bottleneck
Domain expertise embedded in data products
Clear ownership — every product has a team accountable
No central canonical model needed — contracts handle interop

✗ Weaknesses

Heavy governance overhead — federated ≠ free
Requires mature platform team (often 10+ engineers)
Tooling immature — contracts framework, federated governance still evolving
Overkill for small/medium banks with 5-10 teams

Banking exampleJPMorgan Chase, ING, Roche (pharma but instructive) implemented mesh principles. The most successful banking implementations combine mesh organizationally with lakehouse infrastructure underneath. Few small/mid-tier banks adopt full mesh — the org structure isn't there.

When to use todayFor tier-1 banks with 50+ data teams and dedicated platform engineering. For mid-tier banks: borrow concepts (data-as-product, contracts) but skip the full federated governance. For small banks: not yet — too heavy.

Era 4 · 2023+ · AI-Era

The Era of "AI Is The Fourth Consumer"

After BI, ML, and APIs, LLM agents become the fourth class of consumer. Schemas have to be self-documenting, columns have to make sense without joins, and the semantic layer has to be machine-readable. Three patterns that bake these constraints in.

Approach 10 · AI-Era

Activity Schema — Behavioral Event Stream

Inventor: Ahmed Elsamadisi (Narrator)Era: 2020+Where: Fintech, behavioral analytics, ML feature pipelines

Philosophy

Customer behavior is a sequence of events, not a snapshot. Model the stream, derive everything from temporal aggregations. Time over state.

Rationale

Modern fintechs care about funnels, retention, cohorts, journeys — all temporal questions. Traditional star schemas force you to predefine grain; Activity Schema doesn't. Add a new event type? Just a new value in `activity_name`. Schema evolution becomes data evolution. AI/ML feature pipelines fit naturally — features are temporal aggregations.

Activity Schema collapses analytics into one wide event-stream table. Every business event (logged in, viewed account, deposited, applied for loan, paid, called support) becomes a row with customer_id, activity_name, ts, …. Metrics are computed via temporal SQL — first/last/count/before/after/between. Radically simple model; works because cheap storage and columnar engines handle the high cardinality.

CORE IDEAOne wide event table. Every business event is a row. Metrics are temporal SQL. Schema evolution = add a new activity_name. AI-friendly because event semantics are explicit in column values.

✓ Strengths

Schema evolution = add new activity_name (no DDL)
Behavioral analytics natively — funnels, retention, cohorts
AI/ML-friendly — features = temporal aggregations
One mental model for all consumers

✗ Weaknesses

Bad for snapshot/balance reporting (no native "as-of" view)
JSON feature column requires shred/cast for typed access
High cardinality activity table can grow huge
Doesn't replace need for dimensional/relational data for some use cases

Banking exampleRevolut, Chime, Monzo, and other fintech-style neobanks reportedly favor activity-schema-style stores for behavioral analytics, growth marketing, and ML feature pipelines. Less common in traditional banks because regulatory reporting wants positions / balances, not events.

When to use todayFor behavioral analytics, growth metrics, ML feature engineering. As a complement to a relational/dimensional layer, not a replacement. Excellent for AI agents that need to reason about customer journeys.

Approach 11 · AI-Era

Source-Centric Silver + Semantic Gold

Inventor: Emerging community pattern (no single author)Era: 2024+Where: Greenfield 2024+ banking lakehouses

Philosophy

Sources are sovereign. Don't model them into a canonical at silver; mirror them. Conformance is a Gold-layer concern, done per use-case, late-bound. Isolation over integration.

Rationale

Banking sources change every quarter (T24 R23 → R24, Cards system migrations, LOS microservice splits). Pre-conforming at silver means every source change ripples upward through everything. Source-centric silver isolates change to one bucket; Gold per-use-case makes semantic conformance localized, explicit, AI-friendly.

The synthesis I kept seeing recommended for greenfield 2026 builds. Silver mirrors each source 1:1 (silver_t24_*, silver_cards_*, silver_los_*) — append-only, SCD2, audit-ready. Gold is where semantic conformance happens: customer_360 joins all silver sources via the MDM hub at write time. The "common language" lives in glossary + semantic layer + MDM hub — three metadata artifacts, not a physical canonical schema.

CORE IDEASilver = source-mirror (independent, no canonical). Gold = semantic conformance (denormalized, AI-ready). Common language lives in metadata layers (glossary + semantic layer + MDM), not in a physical schema.

✓ Strengths

New source onboards in days — just add silver_X_* mirror
Schema evolution isolated to one silver bucket
AI-agent friendly — Gold is one semantic layer, columns self-document
Glossary + semantic layer = "common language" without rigid schema

✗ Weaknesses

Cross-source query at silver = impossible (must go to Gold)
Gold builds compute-heavy — every snapshot rebuilds joins
Pattern is new — fewer reference implementations to copy
Semantic conformance bugs are real — late binding is harder to test

Banking exampleModern fintech-influenced banks (BBVA, Capital One have been publicly discussed; a growing number of digital banks fit the same profile) gravitate toward this pattern — they treat T24 / Cards / LOS as separate silver islands and build Gold per use case. The pattern fits naturally with dbt + Spark + Databricks.

When to use todayFor greenfield banking lakehouses in 2024+ where dev velocity + AI-readiness + extensibility are the priorities, and the org is OK paying compute at Gold for the simplicity at silver. Not the right fit if you need cross-source queries with sub-second latency at silver.

Approach 12 · AI-Era

Hybrid AI-Native — Lakehouse + Feature Store + Vector Index + Knowledge Graph

Steward: Emerging tier-1 stack (no single author)Era: 2024+Where: AI-forward tier-1 banks

Philosophy

No single store fits all AI workloads. Lakehouse for batch, Feature Store for ML, Vector for RAG, Graph for reasoning. Specialization over unification. Unify by metadata, not by storage.

Rationale

Modern AI workloads have wildly different access patterns. A vector index for RAG can't serve regulatory reports. A lakehouse can't reason over relationships at low latency. Picking the right store per workload is the only way to hit performance + accuracy + cost ceilings. The semantic layer + glossary is what makes them feel like one platform.

The most ambitious modern pattern: combine specialized stores for each workload, unified by a semantic layer. Lakehouse for batch analytics + Gold tables. Feature Store (Feast / Tecton / Databricks FS) for ML features with online + offline parity. Vector Index (FAISS / Pinecone / pgvector) for RAG over banking documents. Knowledge Graph (Neo4j / RDF) for entity resolution and reasoning over relationships. A central semantic layer + glossary unifies them.

CORE IDEAEach AI workload uses the right store. Lakehouse for batch, Feature Store for ML, Vector for RAG, Graph for reasoning. Unified by semantic layer + glossary. Operational complexity high; capability ceiling also high.

✓ Strengths

Best store for each workload — no compromises
AI agent capabilities at the ceiling — RAG + reasoning + numbers
Future-proof — supports new AI workload classes
Each store has best-in-class tooling

✗ Weaknesses

Operational complexity — 4 systems to keep in sync
Data sync challenges — embedding refresh, graph rebuild, FS backfill
Skills required across multiple specialized domains
High cost — both engineering + infrastructure

Banking exampleJPMorgan, Goldman Sachs, BNP Paribas building this stack for fraud + anti-money-laundering + customer 360 chatbot. Cap One / Wells Fargo with mature ML platforms gravitating toward Feature Store + Lakehouse. Knowledge Graph for AML at HSBC.

When to use todayFor tier-1 banks with mature data + ML platform teams and AI as a strategic priority. Not for mid/small banks — the operational burden won't pay back. Start with Lakehouse + Feature Store; add Vector Index when you have RAG use cases; add Knowledge Graph last (most complex).

Comparison Matrix

All 12 Approaches Side-by-Side

Six dimensions that matter for choosing a banking data model in 2026. Scores are subjective + based on observed adoption — calibrate to your context.

Approach	Era	Dev velocity	AI readiness	Maintainability	Extensibility	Audit / regulatory	Banking adoption (VN)	Best-fit context
01 · Inmon CIF	1992	Low	Low	Mid	Low	High	Legacy	Existing 3NF EDW migration
02 · Kimball Star	1996	High	Mid	High	Mid	Mid	Universal	BI / reporting layer (always)
03 · IBM BDW / FSLDM	2000s	Low	Low	Low	Low	High	Tier-1 legacy	Migration; vocabulary reference
04 · BIAN	2008+	Mid	Mid	High	High	High	Growing	API design, microservice boundaries
05 · ISO 20022	2004+	Mid	Mid	Mid	High	High	Mandatory	Payment messaging (must-have by 2025)
06 · FIBO	2010+	Low	High	Mid	High	High	Rare	Glossary backbone, knowledge graph
07 · Data Vault 2.0	2013+	Mid	Mid	Mid	High	High	Rare in VN	Audit-heavy regulatory warehouse
08 · Lakehouse Medallion	2020+	High	High	High	High	Mid	Default	Layering convention (always)
09 · Data Mesh	2019+	Mid	Mid	High	High	Mid	Tier-1 only	Tier-1 with 50+ data teams
10 · Activity Schema	2020+	High	High	High	High	Low	Fintechs	Behavioral analytics, ML features
11 · Source-centric + Semantic Gold	2024+	High	High	High	High	High	Emerging	⭐ Greenfield 2024+ recommended
12 · Hybrid AI-Native	2024+	Mid	Highest	Low	High	High	Tier-1 frontier	Tier-1 with strategic AI mandate

Still figuring it out

An open ending, not a verdict

Sources & Further Reading

Citations

These are the books, papers, and reference docs I worked through. Treat this as a reading list — most of the framing in this post traces back to one of these.

Era 1 · Classical

Inmon, W. H. (2005). Building the Data Warehouse (4th ed.). Wiley. — The original CIF formulation.
Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (3rd ed.). Wiley.
IBM (n.d.). IBM Banking and Financial Markets Data Warehouse — General Information Manual. IBM Redbooks (various editions).
Teradata. Financial Services Logical Data Model (FSLDM) — vendor reference documentation.
Oracle. Financial Services Data Model (FSDM) — Oracle Financial Services reference docs.

Era 2 · Reference Standards

BIAN. BIAN Service Landscape (latest version). bian.org/servicelandscape
ISO. ISO 20022 Universal Financial Industry Message Scheme. iso20022.org
SWIFT. CBPR+ (Cross-Border Payments and Reporting Plus) migration guidance. swift.com/standards/iso-20022
EDM Council. Financial Industry Business Ontology (FIBO) Specification. spec.edmcouncil.org/fibo
Basel Committee on Banking Supervision (2013). BCBS 239: Principles for effective risk data aggregation and risk reporting. Bank for International Settlements.

Era 3 · Modern Lakehouse

Linstedt, D. & Olschimke, M. (2015). Building a Scalable Data Warehouse with Data Vault 2.0. Morgan Kaufmann.
Databricks. The Medallion Lakehouse Architecture. databricks.com/glossary/medallion-architecture
Armbrust, M., Ghodsi, A., Xin, R. & Zaharia, M. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. CIDR 2021.
Dehghani, Z. (2022). Data Mesh: Delivering Data-Driven Value at Scale. O'Reilly Media.
Dehghani, Z. (2019). How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. martinfowler.com. martinfowler.com/articles/data-monolith-to-mesh.html

Era 4 · AI-Era

Elsamadisi, A. (2020). The Activity Schema: A New Way to Model Data. Narrator blog & activityschema.com
Feast contributors. Feast: Open Source Feature Store for Machine Learning. feast.dev
Tecton. What is a Feature Store? tecton.ai/blog/what-is-a-feature-store
Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. (RAG paper.)
Hogan, A., et al. (2021). Knowledge Graphs. ACM Computing Surveys, Vol. 54, No. 4. — Comprehensive survey.
dbt Labs. dbt Semantic Layer documentation. docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl

Cross-cutting & Practitioner Blogs

Fowler, M. Patterns of Enterprise Application Architecture (2002) — for ESB / integration context.
Hohpe, G. & Woolf, B. (2003). Enterprise Integration Patterns. Addison-Wesley.
Capital One Tech. Data Lakehouse on Snowflake — public engineering blog posts (various authors, capitalone.com/tech).
JPMorgan Chase. Fusion data platform — public talks and blog posts on their Databricks lakehouse.
Goldman Sachs Engineering. Data Lake / Legend platform — engineering.goldmansachs.com.
Microsoft Industry Cloud for Financial Services — reference architectures.

If you spot a missing or misattributed source, let me know — I'd like the citation list to be accurate.

← Back to all posts