Ocean Data Pipeline

Ingest external data, clean it, match it to your entities, and serve it fast. Add real-time search with +Search.

Do You Need Ocean?

Ocean is for sites that display data they receive from external sources — not data they write themselves.

You need Ocean if:

You get data feeds from APIs, government sources, partner systems, or supplier databases
Your visitors browse a catalog that changes regularly without your team touching it
You need to match incoming records against an entity graph (locations, companies, products, events)
You want real-time search across a large, frequently updated dataset

You don't need Ocean if:

You write your own content in the admin panel
Your site is a portfolio, blog, or brochure site
All data comes from your own team, not external pipelines

If you're not sure, start with Core. Ocean is an upgrade path, not a requirement.

Family Foods: Real Usage

The Family Foods group added Ocean when Ma's Bakery started publishing a weekly ingredient price report for their commercial wholesale customers.

Every Monday, Ma pulls flour, butter, sugar, and egg pricing from three wholesale distributors — each exports a different CSV format. Before Ocean, Ma was manually copy-pasting into a spreadsheet and updating a page by hand. After Ocean:

CrowsNest scrapes the three distributor price pages on a schedule
Sextant normalizes the messy column names into a canonical unit_price field
Harpoon matches each line item against Ma's ingredient entity graph (even with spelling differences between suppliers)
Athena runs Ma's weekly batch query: cheapest price per ingredient across all three suppliers
Lighthouse caches the results — the price report page loads in under 100ms

Ma gets a published price comparison table every Monday morning with zero manual work.

Cousin's Coffeehouse doesn't use Ocean — they only serve their own menu content and don't need it.

The Pipeline

Ocean is a three-stage pipeline:

Raw data sources              Ocean pipeline                    Your frontend
(CrowsNest scrape,      →  Sextant → Harpoon → Lighthouse  →  millisecond reads
 manual bronze drop,
 API pull)

Sextant normalizes inconsistently-named columns from raw sources to your canonical schema.

Harpoon matches silver records against your entity graph — even with spelling variations. Ma's flour entity matches "All-Purpose Flour", "AP Flour", and "Flour (APF)" across three suppliers.

Lighthouse stores pre-computed snapshots your frontend reads at query time. Pages load fast regardless of data complexity.

Standard vs +Search

Ocean Standard — data ingested on a schedule, results served as pre-computed snapshots. Best for: price reports, product catalogs, event listings, content aggregated from external sources.

Ocean +Search — adds Aurora Serverless v2 with a real-time search index. Your visitors get a search box that queries the entity graph live. Best for: sites where visitors actively search across large datasets.

Aurora Serverless v2 has a minimum of 0.5 ACU at idle — it does not scale to zero.

Decision: Do your visitors need to type a query and get results in real time?

Yes → +Search
No → Standard

You can upgrade from Standard to +Search without re-seeding your data.

Sonar

Sonar is a signal monitoring system built into Ocean. It watches your pipeline for events that match a threshold you define and sends alerts to subscribers. It requires no extra infrastructure — it runs on top of what Ocean already produces.

Family Foods example: Pa's Donuts bakes limited batches. When the weekly run is ready to book, Sonar detects the inventory update and fires alerts to subscribers who opted in. Subscribers get notified before the batch is announced anywhere else.

Sonar uses a credit model — each alert sent costs the subscriber a credit. Pa's gives new loyalty members 10 free credits. After that, credits can be purchased via Commerce (if provisioned) or set to $0 so alerts are always free.

Sonar is included in both Ocean tiers. Commerce is optional (for paid credits).

NEXT_PUBLIC_SS_SONAR=true enables Sonar in your deployment.

How to Add Ocean

Ocean is selected during adapter init:

npx @sirsluginston/aws-adapter init
# ...
? Ocean Data Lake?
  ❯ Standard (batch pipeline, fast reads)
    +Search (Standard + real-time search)
    No

For an existing deployment, re-run init to add Ocean, or set the env var manually:

NEXT_PUBLIC_SIRSLUGINSTON_OCEAN=standard   # or: search

Cost & Solvency Cap

Ocean's infrastructure cost depends on your data volume and tier:

Component	Tier	Cost shape
S3 data lake	Both	Per GB stored — low for most
Athena queries	Both	Per TB scanned — very low for batch
Lambda ingestion	Both	Per invocation — minimal
Aurora Serverless v2	+Search only	Per ACU-hour; minimum 0.5 ACU at idle

For a site like Ma's Bakery (3 supplier feeds, weekly batch), Ocean Standard runs under $10/mo in AWS costs. +Search adds the Aurora minimum, which starts at ~$50–70/mo at idle, scaling with query load.

All Ocean costs feed directly into your solvency cap — your ad publisher ID runs against your full infrastructure bill before SirSluginston Co earns anything above it. Adding Ocean raises your cap automatically; your traffic covers it first.

View your breakdown in the AWS Cost Analytics section of your admin panel.

Works With

Ocean + Sonar — Sonar is built into Ocean. When a Lighthouse snapshot updates (price drop, new inventory batch, new event), Sonar fires to subscribers who match your defined thresholds. No additional data work required.

Ocean + Intellect — Your Intellect AI chatbox can query Ocean's Lighthouse snapshots to give context-aware answers. When a visitor asks "which flour is cheapest this week?", the AI pulls from Ma's live price report and responds accurately — not from training data.

Ocean + Commerce — Your product catalog (Commerce) can be enriched by Ocean data. If a product's pricing or availability comes from an external feed, Ocean keeps it current automatically rather than requiring manual admin panel updates.