Banking Data Models β€” 12 Approaches, 4 Eras

I've been reading my way through how banking data has been modeled β€” from Inmon's 1992 enterprise data warehouse to 2024's source-centric lakehouse with semantic Gold. This post is my notes: 12 approaches, with diagrams, trade-offs, and a banking-domain example for each.

May 2026 Β· ~35 min read Β· learning in public

Heads up β€” this is a learning exercise, not a war story. I haven't shipped all 12 of these. I worked through papers, vendor docs, and case studies, and these are the notes I wish I had when I started reading.

The map: how data has been modeled in banking from 1992 to 2026. Twelve approaches, four eras. Each one gets origin, core idea, a diagram, pros and cons, and a note on what it's still good for today.

The four eras: Classical (top-down EDW + dimensional + canonical CDM), Reference Standards (BIAN + ISO 20022 + FIBO), Modern Lakehouse (Vault + Medallion + Mesh), and AI-Era (Activity Schema + Source-centric + Hybrid). Each era solved a different set of constraints; matching your constraints to one is the only way to pick well.

At the end there's a comparison matrix scoring all 12 on dev velocity, AI readiness, maintainability, extensibility, regulatory fit, and adoption β€” plus an open-ended closing on what I'm still trying to figure out.

Era 1 Β· 1990–2010 Β· Classical

The Era of "Build a CDM"

Storage was expensive, compute happened on a single machine, integration ran on ESBs with rigid contracts, and schema evolution was a project not a PR. The dominant question: "where do we put the canonical?". Three answers emerged.

Approach 01 Β· Classical

Inmon's CIF β€” Top-Down 3NF Enterprise Data Warehouse

Inventor: Bill InmonEra: 1992Where: Most banks 1995–2010

Philosophy

Single version of truth. The warehouse is the enterprise's institutional memory. Build it complete and pure first; consumers come second. Integrity over agility.

Rationale

When storage was $10K/TB and integration ran on ESBs, every byte was costly. Building one fully-normalized 3NF EDW once meant paying the integration cost once. Marts derive from a stable core β€” no semantic drift across reports.

Inmon's Corporate Information Factory was the foundational pattern of enterprise data warehousing. The proposition: build one corporate-wide third-normal-form (3NF) data warehouse first β€” every entity, every attribute, every relationship β€” and then create dependent data marts (often dimensional) on top for specific reporting needs.

CORE IDEABuild the entire enterprise schema in 3NF before you build any data mart. The EDW is the single source of truth; marts are downstream views.
Inmon CIF β€” Top-Down EDW SOURCES Core / T24 Cards LOS CRM STAGING ETL cleansing extract transform load heavy compute ⭐ 3NF EDW single source of truth customer account transaction product branch employee ~80–120 entities, fully normalized 3NF: every fact in one place DEPENDENT MARTS Sales Mart (star)facts + dims Risk MartNPL aggregates Finance MartP&L, GL roll-up Compliance MartAML, regulatory CHARACTERISTIC Build the entire 3NF EDW first (months 1-18), then build marts (months 18+). Time-to-first-insight is long, but once shipped, marts derive from a single trustworthy core.

βœ“ Strengths

  • Single source of truth, integrity guaranteed by 3NF
  • Marts derive consistently β€” no semantic drift
  • Audit + lineage natural (one canonical place)
  • Regulatory reporting (BCBS 239) lineage clean

βœ— Weaknesses

  • Heavy upfront β€” 12-24 months before first business value
  • 3NF is consumer-unfriendly β€” every query needs joins
  • Schema changes ripple through marts
  • Doesn't fit "ship a use-case in a sprint" expectations

Banking exampleMost large banks 1995–2010 built CIF-style EDWs on Teradata or Oracle (Citi, HSBC, Wells Fargo are commonly cited). Local-market banks that started warehousing 2008–2015 typically followed the same template with localization for regulatory reporting.

When to use todayGreenfield: rarely. The upfront cost no longer matches expectations. Migration: still relevant when the existing EDW is 3NF and you're modernizing layer-by-layer. Conceptual: the discipline (single-source-of-truth) is alive in MDM hubs and contracts.

Approach 02 Β· Classical

Kimball's Dimensional β€” Bottom-Up Star Schemas with Conformed Dimensions

Inventor: Ralph KimballEra: 1996Where: Reporting layers in nearly every bank

Philosophy

Business processes are the unit of analysis, not entities. Model what users ask, not what the schema demands. Ship in weeks, conform via shared dimensions. Agility over completeness.

Rationale

Inmon's 18-month upfront EDW was killing projects. Kimball's bottom-up answer β€” build a fact table for one business process now, share dimensions across processes via discipline β€” let teams deliver value continuously while still maintaining cross-process consistency.

Kimball's response to Inmon: instead of modeling everything upfront, build one fact table per business process (sales, transactions, applications) surrounded by conformed dimensions (customer, product, date) that are reused across processes. Star schemas optimized for OLAP queries.

CORE IDEAModel business processes one at a time as fact tables. Share dimensions across processes via the "conformed dimension" discipline. Ship the first mart in weeks, not months.
Kimball Star Schema β€” fact_transaction surrounded by conformed dimensions fact_transaction customer_id, account_id, product_id branch_id, date_id, channel_id amount, txn_type, fee dim_customercustomer_id (PK)name, segment, dobSCD2 history dim_productproduct_id (PK)type, family, currencyconformed dim_accountaccount_id (PK)type, currency, statusSCD1 dim_datedate_id (PK)day, month, qtr, fyconformed dim_branchbranch_id (PK)name, region dim_channelchannel_id (PK)code, class

βœ“ Strengths

  • Fast delivery β€” first mart in 4-8 weeks
  • Query-friendly β€” joins are simple, BI tools love it
  • Conformed dimensions enforce reuse
  • Business users can self-serve once dimensions are clean

βœ— Weaknesses

  • Cross-process queries can be awkward (need conformed dims discipline)
  • Granularity decisions are hard to reverse
  • Doesn't preserve full transactional history without bridge tables
  • Star schemas don't naturally support graph/event queries

Banking examplePower BI / Tableau / SAS / SAP BO report layers in virtually every bank. Even when the silver layer is 3NF or Vault, the consumption layer is usually dimensional β€” the discipline survives every storage shift.

When to use todayFor BI reporting layer on top of any silver. The dimensional discipline (conformed dim, granularity, slowly-changing) is timeless even though the underlying storage is now lakehouse parquet, not Teradata cubes.

Approach 03 Β· Classical

IBM BDW / FSLDM β€” Banking-Specific 3NF Canonical

Inventor: IBM (Banking Data Warehouse), Teradata FSLDM, Oracle FSDMEra: 2000sWhere: Tier-1 banks globally

Philosophy

Banking has been done before. Don't reinvent the entity model β€” inherit it. Every bank has Parties, Arrangements, Events, Positions. Customize attributes; keep structure. Industry consensus over local invention.

Rationale

Building a banking entity model from scratch took 18-24 months. IBM/Teradata/Oracle pre-built ~10 supertype model based on industry consensus. Banks adopted to skip the modeling phase and benefit from regulator familiarity. The 'banking edition of 3NF' off-the-shelf.

If Inmon's CIF says "build a 3NF EDW", IBM BDW (and its siblings Teradata FSLDM and Oracle FSDM) say "here is the 3NF β€” banking edition, off-the-shelf". A pre-defined logical data model with ~10 super-types (Party, Arrangement, Account, Event, Position, Product, Asset, Channel, Resource, Location, Classification) covering every banking concept. Decades of industry consensus baked in.

CORE IDEAA pre-built canonical 3NF schema specific to banking. Banks customize attributes but inherit the entity structure. The "shared vocabulary" is encoded in the entity names and relationships.
IBM BDW / FSLDM β€” 9 super-type canonical entities PARTY individual / org / employee universal "actor" INVOLVED_PARTY M:N bridge β€” role + period ARRANGEMENT deposit / loan / card / facility contract abstraction EVENT transaction / payment / fee heaviest table POSITION balance / outstanding EOM snapshot PRODUCTcatalog CHANNELbranch / atm / online RESOURCEcurrency / asset LOCATIONgeo / branch / org CLASSIFICATIONGL code / tax CONDITIONrate / term / rule All entities have surrogate keys, history (SCD2), and effective_from / effective_to columns.

βœ“ Strengths

  • Off-the-shelf banking semantics β€” no need to invent
  • Vendor-supported (IBM, Teradata, Oracle ship reference doc)
  • Regulator-friendly β€” well-known structure for audit
  • Cross-product semantics already harmonized

βœ— Weaknesses

  • Heavy β€” 80-200 entities even at minimum
  • Consumer-unfriendly β€” every query joins 5-7 tables
  • Slow to onboard new sources (must map into canonical)
  • Schema evolution painful β€” change ripples across all consumers

Banking exampleHSBC, Citi, Standard Chartered, ING β€” commonly cited as having built FSLDM-style canonical warehouses 2000–2015. Mid-tier banks in many emerging markets adopted the same templates as their warehouse foundations because vendor support + regulator familiarity removed two big risks at once.

When to use todayGreenfield: rarely. The 18-month canonical-modeling phase no longer matches dev velocity expectations. Migration: still relevant when modernizing an existing FSLDM warehouse β€” you can keep canonical at silver and add Gold on top (Modern A pattern). Conceptual: the entity vocabulary (Party, Arrangement, Event) is still useful as glossary even if not as physical schema.

Era 2 Β· 2004–2015 Β· Reference Standards

The Era of "Standardize the Wire, Not the Warehouse"

Once SOA matured and cross-bank integration grew, the question shifted from "what does our warehouse look like" to "what does the wire between systems look like". Three industry standards emerged β€” none are warehouse schemas, all are interoperability layers. Banks need to understand them because data lands in the warehouse shaped by these standards.

Approach 04 Β· Standards

BIAN β€” Banking Industry Architecture Network Β· Service Domain Reference

Steward: BIAN consortium (ING, Microsoft, IBM, SAP, Temenos, …)Era: 2008+, current version 13Where: Bank API design, microservices boundaries

Philosophy

A bank is a portfolio of services, not tables. Standardize the service interfaces; let each vendor implement the data however they want. Behavior over storage.

Rationale

Banks integrate dozens of vendor systems. Without a common service vocabulary, every integration is bespoke. BIAN gives a vendor-neutral set of ~300 service domains so a Customer Position service from Temenos talks the same language as one from Finacle β€” APIs become composable.

BIAN is not a data model β€” it's a service-oriented reference architecture. It decomposes a bank into ~300 service domains (Customer Reference, Customer Position, Loan, Payment Order, Card Capture, Fraud Resolution, …), each with defined responsibilities, control records, and operations. When two systems integrate, BIAN gives them a shared vocabulary for what services should exist and what data they exchange.

CORE IDEAA bank is a portfolio of ~300 service domains organized into Business Areas. Each service domain owns its data and exposes operations. Common across the industry β†’ vendor-neutral integration.
BIAN Service Landscape β€” Business Areas β†’ Domains SALES & SERVICEΒ· Customer OfferΒ· Customer AgreementΒ· Customer PositionΒ· Customer Behavior InsightΒ· Sales LeadΒ· Customer CaseΒ· Customer Servicing Session~60 domains OPERATIONS & EXECUTIONΒ· Current AccountΒ· Savings AccountΒ· Loan FulfillmentΒ· Payment ExecutionΒ· Card CaptureΒ· Trade CaptureΒ· Settlement Account~120 domains (largest) RISK & COMPLIANCEΒ· Credit Risk ModelsΒ· Operational RiskΒ· Fraud ResolutionΒ· AML ComplianceΒ· KYC Customer ProfileΒ· Regulatory ComplianceΒ· Audit Trail~50 domains REFERENCE & CONTROLΒ· General LedgerΒ· Reference DataΒ· Master ReferenceΒ· Branch NetworkΒ· HR / WorkforceΒ· ProcurementΒ· Tax Operations~70 domains A BIAN service domain = control record + operations Service Domain: Customer Position Control Record: Customer Position Instance β€” current snapshot of customer's holdings Behavior Qualifiers: Account Position, Card Position, Loan Position, Investment Position Operations: Initiate, Update, Capture, Retrieve, Notify, Exchange β€” vendors implement this contract; integrations consume it via standard APIs. BIAN itself does not prescribe data formats β€” it pairs with ISO 20022 / FIBO / proprietary schemas.

βœ“ Strengths

  • Vendor-neutral β€” Temenos, FIS, Finacle, Oracle all map to BIAN
  • Service-oriented β€” perfect for microservice boundaries
  • Industry consensus β€” pre-built taxonomy of bank capabilities
  • Pairs well with API-first design (REST / GraphQL contracts)

βœ— Weaknesses

  • Not a data model β€” doesn't tell you how to store data
  • Heavy taxonomy β€” 300 domains is too many for small banks
  • Implementation varies β€” domain naming is consistent but data format is not
  • Doesn't replace the warehouse design question

Banking exampleING used BIAN to redesign its IT estate around microservices. Deutsche Bank's API marketplace exposes services aligned to BIAN domains. Many vendors (Temenos, Finacle) advertise "BIAN-aligned" products, meaning their APIs map to BIAN service domains.

When to use todayFor API design and microservice boundaries β€” BIAN is the modern banking equivalent of "REST best practice". For warehouse modeling β€” pair it with another approach (FSLDM canonical, source-centric silver, etc.). BIAN tells you what services exist; you still choose how data is stored.

Approach 05 Β· Standards

ISO 20022 β€” Universal Message Standard for Payments

Steward: ISO + SWIFTEra: 2004+, full SWIFT migration deadline Nov 2025Where: Cross-border payments, RTGS, real-time payments

Philosophy

The wire is the contract. If two systems speak the same structured message, they don't need to share schemas β€” just understand the message. Messaging over modeling.

Rationale

SWIFT MT messages were free-text; payment reconciliation wasted billions globally. ISO 20022 brings rich structured XML/JSON β€” amount, currency, parties machine-parseable. By Nov 2025, every cross-border payment is mandated to use ISO 20022. Not optional.

ISO 20022 is an XML/JSON-based message standard for financial messaging β€” payments, securities, FX, cards, trade. It defines a common business modeling methodology and a standard library of message types (e.g. pacs.008 = customer credit transfer, camt.053 = bank-to-customer statement, pain.001 = customer-initiated payment). By Nov 2025, all SWIFT cross-border payments must use ISO 20022 β€” every bank in the world is affected.

CORE IDEAA common semantic dictionary for financial messages. Any bank, any system, any country can produce and consume the same message types with rich structured data instead of free-text fields.
ISO 20022 Message Structure β€” example: pacs.008 Customer Credit Transfer <Document> FIToFICstmrCdtTrf GrpHdr (Group Header) MsgId CreDtTm NbOfTxs SttlmInf, IntrBkSttlmDt CdtTrfTxInf (Transfer Info) PmtId / EndToEndId IntrBkSttlmAmt ChrgBr (charge bearer) RmtInf (remittance info) Parties Dbtr (debtor) + DbtrAcct DbtrAgt (debtor agent / bank) Cdtr (creditor) + CdtrAcct CdtrAgt (creditor agent) IntrBkSttlmAmt β€” example data point <IntrBkSttlmAmt Ccy="VND">1500000000</IntrBkSttlmAmt> β†’ structured: amount=1.5B, currency=VND. No more "1.5B (VND? USD?)" ambiguity in free text. Compared to legacy SWIFT MT103: free-text amount field, manual parsing, FX disputes. ISO 20022: structured XML, validated, machine-readable.

βœ“ Strengths

  • Global standard β€” replaces SWIFT MT, used in SEPA, CHAPS, Fedwire, ISO 20022 RTP
  • Rich structured data β€” replaces free-text MT103 fields
  • Machine-readable XML/JSON
  • Pre-built domain vocabulary β€” Party, Account, Amount, Settlement

βœ— Weaknesses

  • Verbose XML β€” payloads 5-10x SWIFT MT
  • Not a warehouse schema β€” must flatten / shred for analytics
  • Migration cost is real β€” every payment system, every reconciliation tool
  • Different message types for different flows β€” need sub-domain expertise

Banking exampleSWIFT's CBPR+ initiative mandates ISO 20022 for cross-border payments by Nov 2025. Several national real-time payment systems (SEPA, CHAPS, Fedwire RTP, and a number of country-level RTGS) are aligned to ISO 20022. Every bank running cross-border treasury must onboard ISO 20022 by 2025 β€” no opt-out.

When to use todayMandatory for payment systems by 2025. Use the ISO 20022 message vocabulary as input to your data warehouse (don't store raw XML β€” shred into structured silver tables). Don't use ISO 20022 schemas as warehouse schemas β€” too verbose, too message-shaped.

Approach 06 Β· Standards

FIBO β€” Financial Industry Business Ontology

Steward: EDM Council, OMG (Object Management Group)Era: 2010+Where: Regulatory data definitions, knowledge graphs

Philosophy

Definitions matter. If 'Loan' means three different things in three systems, you don't have a CDM β€” you have chaos. Anchor terms in formal ontology with logical reasoners.

Rationale

Regulatory work (BCBS 239, IFRS 9, EBA reporting) requires precise definitions traceable across systems. FIBO provides OWL/RDF ontology where 'Loan' has a single formal definition. Reasoners validate consistency. Knowledge graphs use this as substrate. ECB and FRB align dictionaries to FIBO.

FIBO is an OWL/RDF ontology β€” formal, machine-readable definitions of financial concepts (Party, Account, Loan, Derivative, Security, Bond, Counterparty…) and their relationships, expressed in description logic. Where FSLDM gives you a relational schema, FIBO gives you a concept graph with formal semantics.

CORE IDEAA formal ontology of finance. Every concept (Loan, Counterparty, Risk Position) has a logical definition; concepts are connected by typed relationships. Machine-readable, reasoner-friendly, regulator-aligned.
FIBO Ontology β€” concepts as classes + typed relationships owl:Thing Party FinancialContract FinancialInstrument isA isA isA LegalEntity NaturalPerson LoanContract DepositContract DebtSecurity Equity isObligorOf isContractedBy Example query (SPARQL on FIBO): SELECT ?loan ?obligor WHERE { ?loan a fibo-fnd-arr:LoanContract . ?obligor fibo-fnd-rel:isObligorOf ?loan . ?obligor a fibo-be-le:LegalEntity . }

βœ“ Strengths

  • Formal semantics β€” disambiguates "what is a Loan"
  • Machine-readable β€” reasoners can infer relationships
  • Regulatory adoption β€” ECB, FRB use FIBO for definitions
  • Aligns with knowledge graph + LLM agent retrieval

βœ— Weaknesses

  • Heavy tooling β€” needs OWL reasoners, RDF stores
  • SPARQL queries unfamiliar to most data engineers
  • Performance β€” graph queries don't scale like SQL on lakehouse
  • Mostly definition layer β€” not where you store actual data

Banking exampleEuropean Central Bank uses FIBO for regulatory data dictionaries. Some banks (Wells Fargo, JPMorgan) maintain FIBO-aligned glossaries linked to physical schemas. BCBS 239 implementations frequently reference FIBO for traceable definitions.

When to use todayAs a glossary backbone β€” define your business terms by reference to FIBO concepts. As a knowledge graph layer β€” when you want LLM agents to reason over definitions. Don't use FIBO as your physical warehouse schema; use it as the semantic anchor on top of whatever physical model you choose.

Era 3 Β· 2013–2022 Β· Modern Lakehouse

The Era of "Cheap Storage Changes Everything"

When S3 cost $0.023/GB/month and Spark/Delta turned the data lake into a queryable warehouse, three new patterns emerged. None of them assume canonical 3NF β€” they assume cheap storage, parallel compute, and that schema can change with a PR.

Approach 07 Β· Modern Lakehouse

Data Vault 2.0 β€” Hub / Link / Satellite

Inventor: Daniel LinstedtEra: 2000s (1.0), 2013 (2.0)Where: European banks, modernization projects

Philosophy

History is sacred. Never overwrite β€” always append. Sources are independent β€” never collide. Conformance happens at read time, not write time.

Rationale

Auditable warehouses (BCBS 239, regulated industries) need full history with provenance. Vault's hub/link/satellite pattern guarantees append-only by construction. Sources stay independent (one satellite per source per hub), making schema evolution isolated. Time-travel is built-in, not bolted-on.

Data Vault 2.0 splits warehouse modeling into three table types: Hub (one row per business key β€” e.g., customer_id), Link (relationships between hubs β€” e.g., customer ↔ account), and Satellite (attributes and history, append-only). Every change is a new satellite row, never an update. Sources stay separate: sat_customer_t24, sat_customer_crm sit side by side under one hub_customer.

CORE IDEAThree table types, append-only, source-aware. Hubs hold business keys, Links hold relationships, Satellites hold attributes from each source. Audit + parallel loading + late-binding semantics β€” all built in.
Data Vault 2.0 β€” Hubs (●) + Links (β—†) + Satellites (β–‘) HUBCustomer HUBAccount HUBLoan customer_bk + load_dts account_bk + load_dts loan_bk + load_dts LINKcust_acct LINKacct_loan SATcust_t24name, dob, segment SATcust_crmphone, email, lead SATacct_t24balance, currency SATloan_losprincipal, npl_grp Append-only β€” every change = new SAT row Example: customer changes segment from MASS β†’ AFFLUENT on 2024-06-15 sat_cust_t24: hash, load_dts=2023-01-01, segment=MASS, name=... hash, load_dts=2024-06-15, segment=AFFLUENT, name=... β†’ Both rows preserved. Time-travel = filter by load_dts.

βœ“ Strengths

  • Append-only β€” perfect audit trail for regulators
  • Source-aware β€” multiple satellites per hub, sources never collide
  • Parallel loadable β€” every hub/link/sat can load independently
  • Late-binding semantics β€” conform at PIT view, not at silver

βœ— Weaknesses

  • Consumer queries are very complex β€” every read needs PIT joins
  • 3-5x more tables than 3NF for the same domain
  • Tooling immature outside specialist vendors (Wherescape, Vaultspeed)
  • Steep learning curve β€” most engineers don't think in HLS

Banking exampleMany European banks (Rabobank, Nordea, ABN AMRO) use Data Vault 2.0 for regulatory warehouses where audit trail is paramount. Several large insurance groups also adopted it. In Vietnam, less common β€” the learning curve and tooling cost are high.

When to use todayWhen audit / lineage / regulatory traceability is the dominant requirement (BCBS 239, IFRS 9 with full history). When you need parallel ingestion from many sources without schema collisions. Pair with a Gold layer for consumer queries β€” Vault is not a consumer interface.

Approach 08 Β· Modern Lakehouse

Lakehouse Medallion β€” Bronze / Silver / Gold

Steward: Databricks (popularized)Era: 2020+Where: Most modern data platforms

Philosophy

Storage is free, compute is elastic. Layer transformations by maturity, not by entity. Consumer value comes from the top layer; durability from the bottom. Convention over prescription.

Rationale

When parquet on S3 cost $0.023/GB and Spark could rebuild any layer in minutes, the cost calculus inverted. Bronze raw becomes the cheap durable record; Silver is conformance; Gold is consumer-shaped. Medallion is the convention that names this maturity gradient β€” letting any specific schema (FSLDM / Vault / source-centric) live inside.

The Medallion is a layering convention, not a specific schema. Bronze = raw / 1:1 with sources. Silver = cleaned, conformed, sometimes joined. Gold = business-aggregated, query-ready. The genius of Medallion is that it doesn't prescribe what goes in silver or gold β€” it just gives you a vocabulary to organize your transformations. You can run FSLDM inside Medallion, or Vault, or source-centric, or all three.

CORE IDEAA layered convention, not a schema. Bronze raw, silver cleaned, gold business-aggregated. Each layer is a step in transformation maturity. The pattern is universal; what fills each layer is a separate choice.
Lakehouse Medallion β€” three maturity layers SOURCES T24, Cards, … BRONZE raw / replica Β· 1:1 with source schema Β· Append-only or CDC Β· Parquet / Delta / Iceberg Β· Unaltered fidelity Goal: durable raw record Cost: zero compute Schema: source-shaped Volume: largest SILVER cleaned / conformed Β· Cleaned, deduped Β· Cross-source joins (or not) Β· SCD2 history Β· Source-of-truth for analytics Choice point: FSLDM canonical? Vault hubs/links/sats? Source-centric mirror? Schema: design choice GOLD business-aggregated Β· customer_360 Β· loan_lifecycle Β· transaction_enriched Β· Aggregates, KPIs, marts Goal: query-ready for BI / ML / AI agent Schema: denormalized Volume: smallest, fastest Medallion is the convention. What goes in silver/gold is your design choice β€” pair it with FSLDM, Vault, or source-centric.

βœ“ Strengths

  • Universal vocabulary β€” every team understands bronze/silver/gold
  • Storage-cheap β€” parquet/Delta on S3 fits any scale
  • Tool-friendly β€” dbt, Spark, Databricks all native
  • Combines with any specific approach (FSLDM, Vault, source-centric)

βœ— Weaknesses

  • Doesn't answer "what goes in silver" β€” still need a model
  • Without discipline, gold balloons (every team builds their own)
  • Layer boundaries fuzzy in practice
  • Marketing-heavy β€” sometimes treated as a complete answer when it's just a container

Banking exampleMost modern banking lakehouses adopt Medallion: Capital One on Snowflake, JPMorgan on Databricks, and Goldman Sachs' internal lakehouse have all been publicly discussed. Banks setting up new Enterprise Data Platforms from 2023 onward almost uniformly use Medallion as the layering convention.

When to use todayAlways β€” as a layering convention. But pair it with a specific schema choice (FSLDM canonical for migration, source-centric for greenfield, Vault for audit-heavy). Medallion alone is not a complete data model.

Approach 09 Β· Modern Lakehouse

Data Mesh β€” Domain-Owned Data Products

Inventor: Zhamak DehghaniEra: 2019+Where: Large orgs with strong domain teams

Philosophy

Data is owned by domains, not the data team. The platform team builds infrastructure; domains build products. Federated governance replaces central modeling. Decentralization over consolidation.

Rationale

At scale (50+ teams), a central data team becomes a bottleneck β€” every new use case waits in their queue. Domain teams have the deepest knowledge of their data anyway. Mesh inverts the org: each domain treats its data as a product with contracts, SLAs, ownership. Central team enables; domains deliver.

Data Mesh is an organizational pattern: instead of a central data team owning all data, each domain (Customer, Loan, Card, Payment, Risk) owns its own data products. Each domain publishes data with contracts, SLAs, ownership. A central "platform" team provides self-serve infrastructure (lakehouse, catalog, contracts framework). Federated computational governance enforces interop without central modeling.

CORE IDEATreat data as a product. Each business domain owns its data products end-to-end. Central platform team provides infrastructure. No central canonical model β€” federated governance + contracts replace it.
Data Mesh β€” federated domains + central platform PLATFORM lakehouse + catalog + contracts + governance self-serve infra team Customerdomaincustomer_360 product Loandomainloan_lifecycle product Cardsdomaincard_txn product Paymentdomainpayment_flow product Riskdomainrisk_position product Compliancedomainaml_alert product contracts contracts

βœ“ Strengths

  • Scales to large orgs (50+ domain teams) without central bottleneck
  • Domain expertise embedded in data products
  • Clear ownership β€” every product has a team accountable
  • No central canonical model needed β€” contracts handle interop

βœ— Weaknesses

  • Heavy governance overhead β€” federated β‰  free
  • Requires mature platform team (often 10+ engineers)
  • Tooling immature β€” contracts framework, federated governance still evolving
  • Overkill for small/medium banks with 5-10 teams

Banking exampleJPMorgan Chase, ING, Roche (pharma but instructive) implemented mesh principles. The most successful banking implementations combine mesh organizationally with lakehouse infrastructure underneath. Few small/mid-tier banks adopt full mesh β€” the org structure isn't there.

When to use todayFor tier-1 banks with 50+ data teams and dedicated platform engineering. For mid-tier banks: borrow concepts (data-as-product, contracts) but skip the full federated governance. For small banks: not yet β€” too heavy.

Era 4 Β· 2023+ Β· AI-Era

The Era of "AI Is The Fourth Consumer"

After BI, ML, and APIs, LLM agents become the fourth class of consumer. Schemas have to be self-documenting, columns have to make sense without joins, and the semantic layer has to be machine-readable. Three patterns that bake these constraints in.

Approach 10 Β· AI-Era

Activity Schema β€” Behavioral Event Stream

Inventor: Ahmed Elsamadisi (Narrator)Era: 2020+Where: Fintech, behavioral analytics, ML feature pipelines

Philosophy

Customer behavior is a sequence of events, not a snapshot. Model the stream, derive everything from temporal aggregations. Time over state.

Rationale

Modern fintechs care about funnels, retention, cohorts, journeys β€” all temporal questions. Traditional star schemas force you to predefine grain; Activity Schema doesn't. Add a new event type? Just a new value in `activity_name`. Schema evolution becomes data evolution. AI/ML feature pipelines fit naturally β€” features are temporal aggregations.

Activity Schema collapses analytics into one wide event-stream table. Every business event (logged in, viewed account, deposited, applied for loan, paid, called support) becomes a row with customer_id, activity_name, ts, …. Metrics are computed via temporal SQL β€” first/last/count/before/after/between. Radically simple model; works because cheap storage and columnar engines handle the high cardinality.

CORE IDEAOne wide event table. Every business event is a row. Metrics are temporal SQL. Schema evolution = add a new activity_name. AI-friendly because event semantics are explicit in column values.
Activity Schema β€” one table, every business event as a row customer_id ts activity feature_json revenue_impact CUS000123452024-06-15 09:14opened_account{type:"SAV", branch:"BR001"}0 CUS000123452024-06-15 09:18deposited{amount:50000000, ccy:"VND"}+50,000,000 CUS000123452024-06-22 14:32viewed_loan_offer{product:"home_loan", channel:"MB"}0 CUS000123452024-06-22 14:48applied_loan{principal:1.5B, tenor:240}+pending CUS000123452024-07-08 10:12received_loan_decision{decision:"approved", rate:0.085}0 CUS000123452024-07-15 11:30disbursed_loan{amount:1.5B, account:"ACC..."}+1,500,000,000 CUS000123452024-08-15 00:00paid_installment{amount:14000000, dpd:0}+14,000,000 Metric: time-to-conversion (loan offer β†’ loan applied) SELECT AVG(applied.ts - viewed.ts) FROM activity WHERE viewed.activity = 'viewed_loan_offer' AND applied.activity = 'applied_loan' AND applied.ts > viewed.ts AND applied.customer_id = viewed.customer_id β€” same table self-joined on temporal predicate. No fact tables, no dimensions.

βœ“ Strengths

  • Schema evolution = add new activity_name (no DDL)
  • Behavioral analytics natively β€” funnels, retention, cohorts
  • AI/ML-friendly β€” features = temporal aggregations
  • One mental model for all consumers

βœ— Weaknesses

  • Bad for snapshot/balance reporting (no native "as-of" view)
  • JSON feature column requires shred/cast for typed access
  • High cardinality activity table can grow huge
  • Doesn't replace need for dimensional/relational data for some use cases

Banking exampleRevolut, Chime, Monzo, and other fintech-style neobanks reportedly favor activity-schema-style stores for behavioral analytics, growth marketing, and ML feature pipelines. Less common in traditional banks because regulatory reporting wants positions / balances, not events.

When to use todayFor behavioral analytics, growth metrics, ML feature engineering. As a complement to a relational/dimensional layer, not a replacement. Excellent for AI agents that need to reason about customer journeys.

Approach 11 Β· AI-Era

Source-Centric Silver + Semantic Gold

Inventor: Emerging community pattern (no single author)Era: 2024+Where: Greenfield 2024+ banking lakehouses

Philosophy

Sources are sovereign. Don't model them into a canonical at silver; mirror them. Conformance is a Gold-layer concern, done per use-case, late-bound. Isolation over integration.

Rationale

Banking sources change every quarter (T24 R23 β†’ R24, Cards system migrations, LOS microservice splits). Pre-conforming at silver means every source change ripples upward through everything. Source-centric silver isolates change to one bucket; Gold per-use-case makes semantic conformance localized, explicit, AI-friendly.

The synthesis I kept seeing recommended for greenfield 2026 builds. Silver mirrors each source 1:1 (silver_t24_*, silver_cards_*, silver_los_*) β€” append-only, SCD2, audit-ready. Gold is where semantic conformance happens: customer_360 joins all silver sources via the MDM hub at write time. The "common language" lives in glossary + semantic layer + MDM hub β€” three metadata artifacts, not a physical canonical schema.

CORE IDEASilver = source-mirror (independent, no canonical). Gold = semantic conformance (denormalized, AI-ready). Common language lives in metadata layers (glossary + semantic layer + MDM), not in a physical schema.
Source-Centric Silver + Semantic Gold SILVER Β· SOURCE-MIRROR silver_t24_* silver_t24_party silver_t24_account silver_t24_txn silver_t24_position mirror T24 1:1, SCD2 silver_cards_* silver_cards_holder silver_cards_account silver_cards_txn silver_cards_authz mirror Cards 1:1 silver_los_* silver_los_application silver_los_decision silver_los_disbursement mirror LOS 1:1 silver_crm_* silver_crm_contact silver_crm_interaction mirror CRM 1:1 MDM Hub Β· party_hub resolves customer_id across all silver sources β†’ 1 golden_customer_id GOLD Β· SEMANTIC CONFORMANCE gold.customer_360 ⭐ JOIN: silver_t24_party + silver_cards_holder + silver_los_applicant + silver_crm_contact via party_hub gold.loan_lifecycle JOIN: silver_los_application + decision + silver_t24_loan + arrangement gold.transaction_enriched UNION: silver_t24_txn + silver_cards_txn METADATA Β· "common language" lives here Business Glossary (OpenMetadata) + Semantic Layer (dbt SL / Cube) + Data Contracts

βœ“ Strengths

  • New source onboards in days β€” just add silver_X_* mirror
  • Schema evolution isolated to one silver bucket
  • AI-agent friendly β€” Gold is one semantic layer, columns self-document
  • Glossary + semantic layer = "common language" without rigid schema

βœ— Weaknesses

  • Cross-source query at silver = impossible (must go to Gold)
  • Gold builds compute-heavy β€” every snapshot rebuilds joins
  • Pattern is new β€” fewer reference implementations to copy
  • Semantic conformance bugs are real β€” late binding is harder to test

Banking exampleModern fintech-influenced banks (BBVA, Capital One have been publicly discussed; a growing number of digital banks fit the same profile) gravitate toward this pattern β€” they treat T24 / Cards / LOS as separate silver islands and build Gold per use case. The pattern fits naturally with dbt + Spark + Databricks.

When to use todayFor greenfield banking lakehouses in 2024+ where dev velocity + AI-readiness + extensibility are the priorities, and the org is OK paying compute at Gold for the simplicity at silver. Not the right fit if you need cross-source queries with sub-second latency at silver.

Approach 12 Β· AI-Era

Hybrid AI-Native β€” Lakehouse + Feature Store + Vector Index + Knowledge Graph

Steward: Emerging tier-1 stack (no single author)Era: 2024+Where: AI-forward tier-1 banks

Philosophy

No single store fits all AI workloads. Lakehouse for batch, Feature Store for ML, Vector for RAG, Graph for reasoning. Specialization over unification. Unify by metadata, not by storage.

Rationale

Modern AI workloads have wildly different access patterns. A vector index for RAG can't serve regulatory reports. A lakehouse can't reason over relationships at low latency. Picking the right store per workload is the only way to hit performance + accuracy + cost ceilings. The semantic layer + glossary is what makes them feel like one platform.

The most ambitious modern pattern: combine specialized stores for each workload, unified by a semantic layer. Lakehouse for batch analytics + Gold tables. Feature Store (Feast / Tecton / Databricks FS) for ML features with online + offline parity. Vector Index (FAISS / Pinecone / pgvector) for RAG over banking documents. Knowledge Graph (Neo4j / RDF) for entity resolution and reasoning over relationships. A central semantic layer + glossary unifies them.

CORE IDEAEach AI workload uses the right store. Lakehouse for batch, Feature Store for ML, Vector for RAG, Graph for reasoning. Unified by semantic layer + glossary. Operational complexity high; capability ceiling also high.
Hybrid AI-Native β€” 4 specialized stores + semantic layer SEMANTIC LAYER Glossary + Metric Defs + Contracts + MDM Lakehouse Bronze + Silver + Gold Delta / Iceberg parquet batch analytics BI, regulatory reports Feature Store Feast / Tecton / DBX FS online + offline parity point-in-time features fraud, credit, churn ML Vector Index FAISS / Pinecone / pgvector embeddings of docs/policies RAG for KYC, policy QA semantic search Knowledge Graph Neo4j / RDF triple-store entities + typed relationships FIBO-aligned ontology fraud rings, AML reasoning AI Agent (LLM) queries via semantic layer

βœ“ Strengths

  • Best store for each workload β€” no compromises
  • AI agent capabilities at the ceiling β€” RAG + reasoning + numbers
  • Future-proof β€” supports new AI workload classes
  • Each store has best-in-class tooling

βœ— Weaknesses

  • Operational complexity β€” 4 systems to keep in sync
  • Data sync challenges β€” embedding refresh, graph rebuild, FS backfill
  • Skills required across multiple specialized domains
  • High cost β€” both engineering + infrastructure

Banking exampleJPMorgan, Goldman Sachs, BNP Paribas building this stack for fraud + anti-money-laundering + customer 360 chatbot. Cap One / Wells Fargo with mature ML platforms gravitating toward Feature Store + Lakehouse. Knowledge Graph for AML at HSBC.

When to use todayFor tier-1 banks with mature data + ML platform teams and AI as a strategic priority. Not for mid/small banks β€” the operational burden won't pay back. Start with Lakehouse + Feature Store; add Vector Index when you have RAG use cases; add Knowledge Graph last (most complex).

Comparison Matrix

All 12 Approaches Side-by-Side

Six dimensions that matter for choosing a banking data model in 2026. Scores are subjective + based on observed adoption β€” calibrate to your context.

ApproachEraDev velocityAI readinessMaintainabilityExtensibilityAudit / regulatoryBanking adoption (VN)Best-fit context
01 Β· Inmon CIF1992LowLowMidLowHighLegacyExisting 3NF EDW migration
02 Β· Kimball Star1996HighMidHighMidMidUniversalBI / reporting layer (always)
03 Β· IBM BDW / FSLDM2000sLowLowLowLowHighTier-1 legacyMigration; vocabulary reference
04 Β· BIAN2008+MidMidHighHighHighGrowingAPI design, microservice boundaries
05 Β· ISO 200222004+MidMidMidHighHighMandatoryPayment messaging (must-have by 2025)
06 Β· FIBO2010+LowHighMidHighHighRareGlossary backbone, knowledge graph
07 Β· Data Vault 2.02013+MidMidMidHighHighRare in VNAudit-heavy regulatory warehouse
08 Β· Lakehouse Medallion2020+HighHighHighHighMidDefaultLayering convention (always)
09 Β· Data Mesh2019+MidMidHighHighMidTier-1 onlyTier-1 with 50+ data teams
10 Β· Activity Schema2020+HighHighHighHighLowFintechsBehavioral analytics, ML features
11 · Source-centric + Semantic Gold2024+HighHighHighHighHighEmerging⭐ Greenfield 2024+ recommended
12 Β· Hybrid AI-Native2024+MidHighestLowHighHighTier-1 frontierTier-1 with strategic AI mandate
Still figuring it out

An open ending, not a verdict

Honestly? I'm still trying to figure out which of these is actually right for the next ten years. Twelve approaches sound like a lot of choice, but each one was a child of its era β€” its constraints, its hardware, its idea of who the consumer would be. The next era is being shaped right now by AI agents that ask different questions, expect different shapes, and care about different things than BI dashboards ever did.

The patterns that look strongest to me on paper β€” source-centric silver, semantic gold, lakehouse as the substrate, FIBO/BIAN/ISO as the shared language β€” feel like a reasonable bet for today. But "reasonable bet" is a long way from "this is the answer". RAG, agentic retrieval, feature stores wired into reasoning, knowledge graphs that aren't just nice diagrams β€” every one of these is moving fast enough that the model I sketch in 2026 may look quaint in 2028.

So this post ends where I am right now: still reading, still asking around, still unsure. If you've built one of these for real and have opinions on what survives the AI-native shift β€” or what I've completely missed β€” I'd love to hear it. I'll keep updating this as I learn.

Sources & Further Reading

Citations

These are the books, papers, and reference docs I worked through. Treat this as a reading list β€” most of the framing in this post traces back to one of these.

Era 1 Β· Classical

  • Inmon, W. H. (2005). Building the Data Warehouse (4th ed.). Wiley. β€” The original CIF formulation.
  • Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (3rd ed.). Wiley.
  • IBM (n.d.). IBM Banking and Financial Markets Data Warehouse β€” General Information Manual. IBM Redbooks (various editions).
  • Teradata. Financial Services Logical Data Model (FSLDM) β€” vendor reference documentation.
  • Oracle. Financial Services Data Model (FSDM) β€” Oracle Financial Services reference docs.

Era 2 Β· Reference Standards

  • BIAN. BIAN Service Landscape (latest version). bian.org/servicelandscape
  • ISO. ISO 20022 Universal Financial Industry Message Scheme. iso20022.org
  • SWIFT. CBPR+ (Cross-Border Payments and Reporting Plus) migration guidance. swift.com/standards/iso-20022
  • EDM Council. Financial Industry Business Ontology (FIBO) Specification. spec.edmcouncil.org/fibo
  • Basel Committee on Banking Supervision (2013). BCBS 239: Principles for effective risk data aggregation and risk reporting. Bank for International Settlements.

Era 3 Β· Modern Lakehouse

  • Linstedt, D. & Olschimke, M. (2015). Building a Scalable Data Warehouse with Data Vault 2.0. Morgan Kaufmann.
  • Databricks. The Medallion Lakehouse Architecture. databricks.com/glossary/medallion-architecture
  • Armbrust, M., Ghodsi, A., Xin, R. & Zaharia, M. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. CIDR 2021.
  • Dehghani, Z. (2022). Data Mesh: Delivering Data-Driven Value at Scale. O'Reilly Media.
  • Dehghani, Z. (2019). How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. martinfowler.com. martinfowler.com/articles/data-monolith-to-mesh.html

Era 4 Β· AI-Era

  • Elsamadisi, A. (2020). The Activity Schema: A New Way to Model Data. Narrator blog & activityschema.com
  • Feast contributors. Feast: Open Source Feature Store for Machine Learning. feast.dev
  • Tecton. What is a Feature Store? tecton.ai/blog/what-is-a-feature-store
  • Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. (RAG paper.)
  • Hogan, A., et al. (2021). Knowledge Graphs. ACM Computing Surveys, Vol. 54, No. 4. β€” Comprehensive survey.
  • dbt Labs. dbt Semantic Layer documentation. docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl

Cross-cutting & Practitioner Blogs

  • Fowler, M. Patterns of Enterprise Application Architecture (2002) β€” for ESB / integration context.
  • Hohpe, G. & Woolf, B. (2003). Enterprise Integration Patterns. Addison-Wesley.
  • Capital One Tech. Data Lakehouse on Snowflake β€” public engineering blog posts (various authors, capitalone.com/tech).
  • JPMorgan Chase. Fusion data platform β€” public talks and blog posts on their Databricks lakehouse.
  • Goldman Sachs Engineering. Data Lake / Legend platform β€” engineering.goldmansachs.com.
  • Microsoft Industry Cloud for Financial Services β€” reference architectures.

If you spot a missing or misattributed source, let me know β€” I'd like the citation list to be accurate.

← Back to all posts