Deep Tech · Seed · Patent Pending

Hypernym

Runtime compression and provenance infrastructure for the AI stack. Hypernym compresses context at the semantic level—making models faster, cheaper, and provably grounded without changing your workflow.

The Problem

AI inference is expensive.
Context is why.

Every AI agent—coding, research, legal, medical—needs to read massive amounts of context before it can act. This consumes tokens, burns compute, and scales cost linearly. The more capable the model, the worse the problem gets.

$3.54

Cost Per Task

Average agentic coding task on Devin without compression

388s

Time Per Task

Wall-clock SWE-bench task completion without optimization

Baseline Resolution

Control group pass rate on hard SWE-bench problems (0/5)

5×

Stack Depth

Hardware → Models → Context → Orchestrators → Apps. Every layer guesses.

The Solution

Semantic compression that makes AI structurally smarter.

Hypernym operates at the layer between your data and your models. It compresses context while preserving—and proving—semantic fidelity.

01 Compression

87% context reduction through word-level semantic compression. A 3B model outperforms an 8B on Hypernym’s engine—same weights, denser context. 90%+ token reduction with controllable fidelity.

02 Provenance

Every output carries a mechanical source chain. Where a fact came from, how it was derived, whether it survives verification. Your provenance—portable, auditable, not platform-owned.

03 Compounding

What one agent figures out, the next one builds on. Intelligence compounds across agents and sessions without data crossing boundaries. The algorithms improve for everyone.

The Science

Mercury: Semantic Field Constriction

Based on our patent-pending research (Forrester & Sulea, 2025). A novel word-level semantic compression scheme that achieves 90%+ token reduction while preserving semantic similarity to source text.

Phase 1

P-Span Sweep

20 compression levels scanned to find optimal parameters for the input domain and content type.

Phase 2

Stochastic Trials

60 independent compression trials across high-dimensional slices. Each trial compresses via hypernym substitution independently.

Phase 3

Coherence Analysis

Pairwise cosine similarity scoring across all 60 trials. Pareto-optimal frontier identified for compression vs. fidelity.

Phase 4

Semantic Assembly

Single-linkage clustering at 0.85 similarity threshold. Facts ranked by cross-trial frequency. 90%+ = highly stable.

Core Mechanism

Semantic Field Identification

Analyzes input text to map lexical items to semantic hierarchies. Identifies which specific terms share broader categorical relationships.

Hypernym Substitution

Related words are replaced with their higher-order semantic category (hypernym). Positional markers and metadata preserve reconstruction context.

Lossless Reconstruction

Sufficient contextual information is retained to recover original semantic precision. Controllable fidelity dial—from aggressive compression to fully lossless.

Compression vs. Semantic Fidelity (Pareto Frontier)

SECTORS: Code · Prose · Legal · Medical · Academic · Financial PAPER: arxiv.org/abs/2505.08058

Benchmarks

Measured results, not projections.

SWE-bench Verified, model compression across 1,200 samples, and mechanical verification across 800 samples. All reproducible.

2×

SWE-Bench Speed

388s → 192s task completion.
30× more context in same window.
10× throughput per dollar.

3B > 8B

Model Compression

Granite 3B outperforms Llama 3.1 8B.
1.36× faster. 2.3× smaller.
1,200 sample benchmark.

100%

Grounding Rate

Cross-model hybrid architecture.
Zero fabrication. Every claim traced.
800 sample mechanical verification.

Compression Performance Across Benchmarks

Cost & Time Reduction

Case Study

Agent Benchmark: Devin on SWE-bench Verified

Measuring wall-clock time reduction and task resolution when AI coding agents receive Hypernym-compressed codebase context. Controlled experiment, reproducible results.

Experiment Design

test: pytest-dev__pytest-5787
difficulty: 1–4 hours
external_solves: 31
control_runs: 5
treatment_runs: 5
benchmark: SWE-bench Verified (Opus 4.6)

Control — Without Hypernym

0 / 5

Treatment — With Hypernym

4 / 5 — 80%

→ Full interactive benchmark at mark2.hypernym.ai

87%

Context Compressed

Semantic fidelity preserved through stochastic trial consensus

96.4%

Size Reduction

Raw token count reduction while maintaining all critical facts

$3.54

↓

$0.35

Per-Task Cost

388s

↓

192s

Wall-Clock Time

At Scale: Annual Savings Estimate

$11,500

Per Developer / Year

730 hrs

Dev Time Saved / Year

30–60%

Routine Task Speedup

Statistical Breakdown

Compression metrics across evaluation.

Metric	Value	Method	Notes
Context Compression	87%	60-trial stochastic consensus	Measured on SWE-bench Verified codebases
Semantic Similarity	87.2%	Cosine similarity, source vs. compressed	Cross-genre: code, prose, scientific text
Token Reduction	90%+	Mercury hypernym substitution	Controllable fidelity dial (aggressive to lossless)
Compression Ratio	9.7×	Input tokens / output tokens	Demonstrated on Wizard of Oz Chapter 1 (1,840 → 191 tokens)
Grounding Rate	100%	Mechanical verification, not LLM	800 sample benchmark, zero fabrication
Task Resolution Lift	0% → 80%	SWE-bench Verified (Opus 4.6)	pytest-5787, 1–4h difficulty
Cost Per Task	$3.54 → $0.35	Token cost at inference	10× reduction, Devin benchmark
Wall-Clock Speedup	2×	End-to-end task time	388s → 192s on SWE-bench

Applicability

One engine. Every domain that runs inference.

Hypernym’s compression is domain-aware and sector-configurable. The same engine adapts to code, scientific research, legal text, and medical literature. Partners define their domains—Hypercore is the engine they run on.

Models & Inference

Model providers use Hypernym to make smaller models competitive with larger ones. Compression algorithms score in ways model providers are years behind.

Llama · Granite · Devin · Command · Claude

LivePilotPipeline

Code & Developer Tools

AI coding agents get compressed codebase context via simple API. GitHub OAuth integration, every session, all day. 30–60% raw speedup on tasks.

Hypercode · Windsurf · Claude Code · Cline

LivePipeline

Memory & Knowledge

Memory platforms use Hypernym to compress and cross-reference persistent context. What one session learns carries forward to the next.

Amble · Wordware · Boardy · Halo

LivePilotPipeline

Biomedical & Legal

Domain-specialized compression for research literature, clinical data, patent filings, and legal documents. 597M+ records indexed across 26 databases.

Osmium (biomedical) · TrustFoundry (legal)

LivePilot

Integration

Deploys into your stack. Invisible to your users.

☁

Cloud API

Managed service. One API call. GitHub OAuth for repo access. Cerebras-powered inference for rapid response.

⚖

Private VPC

Deploy to AWS VPC. Use your own inference providers. Data completely insulated, end to end. Containerized.

⚙

Compatible With

Claude Code · Codex · Devin · MCP-compatible tools. Works orthogonally with existing optimization.

Meta AWS Cohere Intel NVIDIA