Ocean Data Pipeline
Ingest external data, clean it, match it to your entities, and serve it fast. Add real-time search with +Search.
Do You Need Ocean?
Ocean is for sites that display data they receive from external sources — not data they write themselves.
You need Ocean if:
- You get data feeds from APIs, government sources, partner systems, or supplier databases
- Your visitors browse a catalog that changes regularly without your team touching it
- You need to match incoming records against an entity graph (locations, companies, products, events)
- You want real-time search across a large, frequently updated dataset
You don't need Ocean if:
- You write your own content in the admin panel
- Your site is a portfolio, blog, or brochure site
- All data comes from your own team, not external pipelines
If you're not sure, start with Core. Ocean is an upgrade path, not a requirement.
Family Foods: Real Usage
The Family Foods group added Ocean when Ma's Bakery started publishing a weekly ingredient price report for their commercial wholesale customers.
Every Monday, Ma pulls flour, butter, sugar, and egg pricing from three wholesale distributors — each exports a different CSV format. Before Ocean, Ma was manually copy-pasting into a spreadsheet and updating a page by hand. After Ocean:
- CrowsNest scrapes the three distributor price pages on a schedule
- Sextant normalizes the messy column names into a canonical
unit_pricefield - Harpoon matches each line item against Ma's ingredient entity graph (even with spelling differences between suppliers)
- Athena runs Ma's weekly batch query: cheapest price per ingredient across all three suppliers
- Lighthouse caches the results — the price report page loads in under 100ms
Ma gets a published price comparison table every Monday morning with zero manual work.
Cousin's Coffeehouse doesn't use Ocean — they only serve their own menu content and don't need it.
The Pipeline
Ocean is a three-stage pipeline:
Raw data sources Ocean pipeline Your frontend
(CrowsNest scrape, → Sextant → Harpoon → Lighthouse → millisecond reads
manual bronze drop,
API pull)
Sextant normalizes inconsistently-named columns from raw sources to your canonical schema.
Harpoon matches silver records against your entity graph — even with spelling variations. Ma's flour entity matches "All-Purpose Flour", "AP Flour", and "Flour (APF)" across three suppliers.
Lighthouse stores pre-computed snapshots your frontend reads at query time. Pages load fast regardless of data complexity.
Standard vs +Search
Ocean Standard — data ingested on a schedule, results served as pre-computed snapshots. Best for: price reports, product catalogs, event listings, content aggregated from external sources.
Ocean +Search — adds Aurora Serverless v2 with a real-time search index. Your visitors get a search box that queries the entity graph live. Best for: sites where visitors actively search across large datasets.
Aurora Serverless v2 has a minimum of 0.5 ACU at idle — it does not scale to zero.
Decision: Do your visitors need to type a query and get results in real time?
- Yes → +Search
- No → Standard
You can upgrade from Standard to +Search without re-seeding your data.
Sonar
Sonar is a signal monitoring system built into Ocean. It watches your pipeline for events that match a threshold you define and sends alerts to subscribers. It requires no extra infrastructure — it runs on top of what Ocean already produces.
Family Foods example: Pa's Donuts bakes limited batches. When the weekly run is ready to book, Sonar detects the inventory update and fires alerts to subscribers who opted in. Subscribers get notified before the batch is announced anywhere else.
Sonar uses a credit model — each alert sent costs the subscriber a credit. Pa's gives new loyalty members 10 free credits. After that, credits can be purchased via Commerce (if provisioned) or set to $0 so alerts are always free.
Sonar is included in both Ocean tiers. Commerce is optional (for paid credits).
NEXT_PUBLIC_SS_SONAR=true enables Sonar in your deployment.
How to Add Ocean
Ocean is selected during adapter init:
npx @sirsluginston/aws-adapter init
# ...
? Ocean Data Lake?
❯ Standard (batch pipeline, fast reads)
+Search (Standard + real-time search)
No
For an existing deployment, re-run init to add Ocean, or set the env var manually:
NEXT_PUBLIC_SIRSLUGINSTON_OCEAN=standard # or: search
Cost & Solvency Cap
Ocean's infrastructure cost depends on your data volume and tier:
| Component | Tier | Cost shape |
|---|---|---|
| S3 data lake | Both | Per GB stored — low for most |
| Athena queries | Both | Per TB scanned — very low for batch |
| Lambda ingestion | Both | Per invocation — minimal |
| Aurora Serverless v2 | +Search only | Per ACU-hour; minimum 0.5 ACU at idle |
For a site like Ma's Bakery (3 supplier feeds, weekly batch), Ocean Standard runs under $10/mo in AWS costs. +Search adds the Aurora minimum, which starts at ~$50–70/mo at idle, scaling with query load.
All Ocean costs feed directly into your solvency cap — your ad publisher ID runs against your full infrastructure bill before SirSluginston Co earns anything above it. Adding Ocean raises your cap automatically; your traffic covers it first.
View your breakdown in the AWS Cost Analytics section of your admin panel.
Works With
Ocean + Sonar — Sonar is built into Ocean. When a Lighthouse snapshot updates (price drop, new inventory batch, new event), Sonar fires to subscribers who match your defined thresholds. No additional data work required.
Ocean + Intellect — Your Intellect AI chatbox can query Ocean's Lighthouse snapshots to give context-aware answers. When a visitor asks "which flour is cheapest this week?", the AI pulls from Ma's live price report and responds accurately — not from training data.
Ocean + Commerce — Your product catalog (Commerce) can be enriched by Ocean data. If a product's pricing or availability comes from an external feed, Ocean keeps it current automatically rather than requiring manual admin panel updates.