The Data Substrate Matters
- Greg McConnell
- Jun 5
- 10 min read
Building on Solid Ground
In 2006, data scientist Clive Humby famously remarked, “Data is the new oil,” capturing the notion that raw information, properly refined, becomes a strategic fuel for business decisions. Nearly fifteen years later, Humby’s insight remains prescient. When marketing teams rely on fragmented or inconsistent data, it is akin to constructing a skyscraper on shifting sand. Every report, dashboard, and AI-generated recommendation risks collapse under the weight of poor foundations.
A “data substrate” refers to the underlying layer of data collection, storage, validation, and organization that supports all downstream analyses and applications. For marketing organizations, this substrate typically encompasses:
First-party customer data: website analytics, CRM records, loyalty-program interactions.
Creative performance metrics: impressions, clicks, conversion rates from display networks, paid social, and search advertising.
Financial and budgetary data: media spend, cost per acquisition, return on ad spend.
Supplementary contextual data: macroeconomic indicators, seasonal calendars, competitive intelligence.
Without a cohesive substrate that unifies these domains, marketers spend inordinate amounts of time cleaning and matching spreadsheets, stitching together siloed dashboards, and reconciling conflicting definitions. Ask “What exactly counts as a ‘lead’?” and you will see the challenge: one platform may define a lead based on form submits, another on time spent on site. This confusion can derail campaigns and erode confidence in any ensuing analysis. A 2016 IBM study estimated that poor data quality costs the U.S. economy $3.1 trillion annually, about 15 percent of total healthcare spending, underscoring how serious the downstream implications of bad data can be (IBM 2016). Additionally, Gartner estimated in 2017 that poor data quality costs organizations an average of $15 million per year (Gartner 2017), further illustrating the material impact on both budgets and strategy.
Tom Davenport, analytics expert at Babson College, wrote in a 2006 Harvard Business Review article that “Without big data, you are blind and deaf and in the middle of a freeway.” When data is incomplete or lacks context, marketers’ ability to steer and optimize campaigns becomes severely impaired. It is not merely a theoretical risk: in a 2018 Experian report, 91 percent of firms admitted to problems with data accuracy, and 75 percent perceived data management issues as a barrier to digital transformation (Experian 2018). These statistics reinforce that in the digital age, a shaky substrate can translate directly into missed opportunities and wasted marketing spend.
Trust and Transparency: The Hallmarks of Quality
Even if a marketing team succeeds in unifying disparate data sources, the work is not done. Ensuring that information remains accurate, consistent, and up to date is the next challenge. Inconsistent or outdated data can look like a customer record with a five-year-old mailing address, duplicate entries for the same lead, or a social platform that changed its engagement definition without noting the revision. Both cases erode stakeholder confidence and, over time, can foster skepticism toward any analytics output.
A 2018 Forrester Research report found that 74 percent of business and technology decision-makers cited “incomplete or inaccurate data” as a primary barrier to generating actionable insights (Forrester 2018). In a related 2019 Deloitte survey, 49 percent of respondents indicated that data quality and governance remained their top challenge when pursuing data-driven initiatives (Deloitte 2019). These studies highlight that unreliable data leads to misguided strategic decisions, missed revenue opportunities, and wasted ad spend.
To build trust, organizations should implement a continuous data-governance program that includes:
Automated validation checks: flagging missing UTM parameters, identifying anomalous outlier values, and detecting duplicate records at ingestion time.
Standardized taxonomies and metadata frameworks: ensuring that a “lead” in Salesforce aligns exactly with a “lead” in Google Analytics and that campaign names follow a consistent convention across all paid platforms.
Regular audit processes: for example, quarterly reviews to confirm that data-collection scripts, API integrations, and platform updates still function as intended.
Clear communication of data provenance: documenting when and how data fields were sourced, transformed, and enriched to provide transparency for end users.
When marketing leaders see a dashboard in which every metric derives from a rigorously maintained substrate, they move faster, make bolder bets, and scale initiatives with confidence rather than second-guessing whether a chart is based on yesterday’s data or a stale import file. This foundational trust is what distinguishes teams that can pivot quickly in response to market shifts from those that hesitate and lose ground to competitors.
Interconnectivity: Context Is King
In the early 2000s, marketers consulted multiple point solutions: one for email, another for search, and a third for social. Each tool generated its own reports, often using disparate definitions and time windows. Today, the challenge is not merely consolidation but also enabling meaningful connections among data elements. A truly rich substrate is not a flat table of numbers; it is a dynamic graph of interrelated entities: Customer X saw creative Y, which generated click Z, costing $A per engagement, during promotion B, leading to conversion C. Only with these relationships encoded can marketers answer nuanced questions and attribute performance accurately.
According to a 2018 IDC forecast, the global “datasphere” will grow from 33 zettabytes in 2018 to 175 zettabytes by 2025, a compound annual growth rate of 61 percent (IDC 2018). As data volumes expand, the variety of sources—connected TV, streaming audio, mobile in-app, point-of-sale terminals—means interconnectivity is no longer optional. Marketers must trace a customer’s journey across channels, tie it back to budget allocations, and infer how each creative variation impacts downstream outcomes.
For example, if a CMO wants to test which subject line in an email series, combined with a particular paid social audience, yielded the highest incremental sales lift last quarter, the substrate must already encode those linkages before any analysis can proceed meaningfully. Attempting to answer such a question with siloed data sources invariably leads to hours of manual work: exporting spreadsheets, merging tables, remapping inconsistent IDs, and rewriting formulas—only to produce results that may not align with the next-day state of a campaign. As industry veteran Jeff Malmad of Forrester noted in a 2017 report, “Without connected data, you cannot move from reporting to true analytics, and without analytics, you cannot drive optimization at scale” (Forrester 2017).
Interconnectivity also enhances collaboration: when a single unified substrate feeds multiple teams—brand marketing, performance marketing, analytics, finance—everyone operates from “one version of the truth.” Departments can explore cross-channel insights together, ask more sophisticated “what if” questions, and hold each other accountable to shared KPIs. This shared context fosters innovation rather than silos.
Scalability and Adaptability: Preparing for Tomorrow, Today
No marketing organization remains static. New channels emerge—TikTok in 2018, connected TV in 2020, podcasts integrated into ad targeting in 2021. Business priorities shift: for instance, the COVID-19 pandemic accelerated e-commerce by as much as six years in some verticals, according to McKinsey (McKinsey 2020). AI-powered tools evolve at breakneck speed: from simple rule-based automations to deep-learning models that predict lifetime value. If the foundational data substrate is rigid and designed solely for today’s stack, every new requirement forces a major reengineering effort.
Gartner predicted in 2019 that by 2022, 90 percent of corporate strategies would explicitly mention information as a critical enterprise asset, yet nearly half of those organizations would fail to achieve benefits due to poor management of their information strategy (Gartner 2019). This gap between aspiration and execution often stems from substrates that cannot absorb new data feeds or adapt to evolving definitions without extensive custom coding. Early adopters of cloud data warehouses learned this lesson the hard way: monolithic schemas built for a fixed set of use cases became brittle the moment marketers requested data from a new social channel or an offline POS system.
A future-proof substrate has three distinguishing features:
Modular Ingestion PipelinesWhether ingesting Facebook Ads KPIs, Salesforce CRM exports, YouTube engagement logs, or emerging data sources like TikTok ad performance, modular pipelines allow new connectors to be added with minimal disruption. By abstracting each source’s API details behind a common ingestion layer, teams avoid writing bespoke code for every new channel.
Schema FlexibilityUsing a slightly denormalized schema—where raw events, metadata tags, and summary tables coexist—marketing technologists can introduce new data attributes, such as “ad quality score,” “view-through conversions,” or “brand lift metrics,” without rewriting existing queries. This approach preserves historical analysis while accommodating changes in platform reporting.
Governance and Version ControlEvery change to the substrate—adding a new column, deprecating an outdated field, revising a customer segmentation—should be tracked in a governance catalog. Historical snapshots remain accessible, so a report built on last year’s schema still runs correctly even after schema changes. This version control prevents “breaking the history” when marketing definitions evolve.
When marketing leaders know the substrate can flex as the business flexes, they spend less time firefighting and more time experimenting. New A/B tests, emerging ad formats, or unexpected competitive challenges can be addressed confidently because the substrate was designed to scale. In a 2020 survey by Oracle, 52 percent of marketers reported that scalable data infrastructure was the top enabler of marketing agility (Oracle 2020). This statistic confirms that adaptability is not just a convenience—it is a competitive necessity.
From Technical Asset to Strategic Advantage
Investing in a high-quality data substrate is not merely a technical initiative; it is a strategic one. A 2020 Forrester report showed that organizations with advanced data management practices were twice as likely to report revenue growth above industry average (Forrester 2020). That correlation is no accident: companies with disciplined data foundations iterate campaigns faster, personalize experiences more precisely, and pivot budgets away from underperforming creatives before losses escalate.
In the words of Thomas H. Davenport, author of Competing on Analytics, “Organizations that invest in their information and analytics infrastructure outperform those that don’t” (Davenport 2006). By viewing the data substrate as a strategic asset rather than a mere technical necessity, marketers gain the agility to respond to market shifts, craft hyper-targeted campaigns, and deliver personalized experiences at scale.
Case in Point: Retailer X
In 2019, a mid-market apparel retailer—hereafter “Retailer X”—unified data from four e-commerce platforms, two paid social sources, and its in-store point-of-sale system. By replacing a patchwork of spreadsheets and ad-hoc database exports with a centralized, well-governed substrate, Retailer X achieved within six months:
Reduced time-to-insight by 75 percent. Reports that once took two weeks to assemble now appeared in two business days.
Increased marketing ROI by 18 percent. With cross-channel cannibalization clearly visible in a single dashboard, budget reallocations could be executed in days rather than weeks.
Launched AI-driven lookalike models. With consolidated customer profiles, the data science team built more precise lookalike audiences for Facebook Ads, leading to a 23 percent lift in incremental conversions (Retailer X internal report, 2019).
Although the technology stack—a cloud data warehouse, ETL platform, and visualization layer—helped, the real differentiator was Retailer X’s commitment to data hygiene: validating address fields, standardizing offline sales tags, and enriching records with real-time API calls to verify email addresses. This emphasis on meticulous governance and context ensured that Retailer X could scale quickly without data-driven roadblocks.
mktg.ai: Built on the Principle That the Substrate Matters
From its inception, mktg.ai was designed in recognition of this principle: a robust, well-structured data substrate is essential for delivering trustworthy AI-driven marketing insights. Rather than stitching together siloed dashboards, the platform unifies first-party customer data, creative performance metrics, and financial spend information into a single, coherent layer. This unified substrate enables every feature—automated performance reporting, AI-powered recommendations, or Discover, the intelligent agentic layer—to function seamlessly and reliably.
By placing the substrate at the heart of its architecture, mktg.ai ensures that marketing teams generate actionable insights without wrestling with data discrepancies. For example, when a user wants to compare the performance of an evergreen display banner against a seasonal video ad in one view, mktg.ai’s substrate already captures the relationship between creative type, campaign spend, and conversion lift. There is no need to wrangle multiple CSV exports or manually align time windows. Instead, the platform instantly surfaces a single, trusted source of truth.
This approach empowers marketers to focus on strategic tasks—crafting compelling messages, testing new hypotheses, and driving cross-channel synergy—instead of spending hours on data preparation. By recognizing that “the data substrate matters,” mktg.ai transforms a technical backbone into a strategic enabler.
Putting “The Data Substrate Matters” into Practice
For marketing teams eager to embrace AI-driven insights—chatbots, automated recommendations, dynamic budget allocation—here are five prescriptive steps:
Conduct a Data AuditMap all existing data sources: ad platforms, CRM exports, web analytics, loyalty databases. Identify common identifiers—customer IDs, transaction IDs—and note where definitions diverge, such as “impression” across Facebook versus Google Display. Document any gaps in coverage, for example missing offline POS data or missing historical metrics from retired platforms.
Define a Governance FrameworkForm a small cross-functional committee—marketing leadership, data engineering, analytics—to agree on naming conventions, update cadences, and data-quality thresholds (for example, 98 percent completeness for transaction rows). Assign clear ownership for each data domain: who is responsible for validating email marketing metrics, who oversees social media ingestion, and who vets third-party enrichment sources.
Implement Continuous ValidationUse tools or automated scripts to surface anomalies—unusually low or high values, missing timestamps, or mismatched totals between source and ingested records. Display these issues in an alert dashboard so data engineers can fix them before reports break. Incorporate checks for duplicate records, outdated cookie IDs, or deprecated tracking parameters at ingestion time.
Enrich and ContextualizeAdd third-party demographic or firmographic data where relevant. When segmenting leads by industry, link CRM accounts to a vendor such as ZoomInfo or Clearbit to include company size, SIC code, or revenue band. For B2C brands, append household income or life event data from reputable providers. Contextualized data enhances substrate richness and elevates downstream analysis.
Build for ModularityAs new channels emerge—connected TV in 2019, in-app audio in 2020, short-form video in 2021—the substrate should already support a plug-and-play ingestion pattern. Teams should onboard a new data connector within days, not months. This agility allows marketers to test nascent platforms quickly, gather early performance data, and optimize allocation before competitors have even defined their KPIs.
These steps reflect best practices in 2025. However, the underlying principle echoes the wisdom of 2005: investment in data foundations always precedes the leap into advanced analytics.
Conclusion: The Time to Prioritize the Substrate Is Now
George Mathew’s remark at AI Trailblazers—“The data substrate matters”—is a timely reminder that for AI and machine learning to achieve their full potential in marketing, data integrity must be nonnegotiable. Marketers can no longer tolerate fragmented dashboards or multiple versions of the truth. Instead, by treating the data substrate as a strategic asset—rigorously governed, richly contextualized, and architected for scale—organizations position themselves to extract maximum value from every AI tool, from regression models to autonomous recommendation engines.
As of 2025, IDC estimates that 175 zettabytes of data will exist globally (IDC 2018), much of it relevant to marketing efforts. Yet without a substrate capable of organizing that data into coherent, trustworthy, and adaptable structures, marketers risk drowning in noise rather than surfacing actionable insights. According to a 2021 McKinsey study, companies that democratize data and analytics realize up to 40 percent higher revenue growth than peers (McKinsey 2021). That growth gap illustrates the competitive advantage of investing early in substrate quality.
Investing in a strong data substrate may not generate immediate headlines, but it lays the groundwork for breakthrough outcomes: a 23 percent lift in lookalike conversions, a 75 percent reduction in report-generation time, or simply the confidence to make high-stakes decisions without fear of data-driven missteps. Build on solid foundations, and what follows can only rise higher.

References
IBM (2016). The Four V’s of Big Data: Volume, Velocity, Variety, and Veracity. IBM InfoSphere.
Forrester Research (2018). State of Data Management and Insights for Marketers. Forrester.
IDC (2018). The Digitization of the World: From Edge to Core. IDC White Paper.
Gartner (2017). The Cost of Poor Data Quality. Gartner Report.
Gartner (2019). Gartner Survey Reveals 90 Percent of Corporate Strategies Will Include Information as a Critical Asset by 2022. Gartner Press Release.
Experian (2018). Global Data Management Benchmark Report. Experian.
Oracle (2020). Marketing Agility and Scalable Infrastructure Survey. Oracle Corporation.
Davenport, T. H. (2006). “Competing on Analytics.” Harvard Business Review.
Deloitte (2019). Data Quality Challenges in Digital Transformation Survey. Deloitte Insights.
McKinsey (2020). “The COVID-19 Recovery Will Be Digital: A Plan for the First 90 Days.” McKinsey & Company.
McKinsey (2021). The State of AI in 2021: Moving from Ideas to Impact. McKinsey & Company.
Retailer X Internal Report (2019). Marketing ROI and Data Substrate Impact Analysis. Confidential.
Comments