Ephemeral semantic embedding service.
Chunk text. Embed with GTR-T5-base (768d). Search by cosine similarity.
Fully in-memory. No disk. No state between restarts. GPU on Cloud Run.
Sentence-boundary chunking + GTR-T5-base embeddings (768d). Stores in RwLock<HashMap> — pure ephemeral compute.
Cosine similarity with optional temporal decay weighting (decay_halflife_hours) and nearby context expansion.
Named ephemeral vector stores with 2 hr TTL. Ideal for agent working memory that doesn't need to outlive a session.
Per-agent orthogonal matrix rotation on embeddings. Cosine similarity preserved under encryption. Keys are in-memory only.
organize role uses local GTR-T5-base (768d, always free). retrieve role uses OpenAI text-embedding-ada-002 (1536d) — pass your own key per-request or set OPENAI_API_KEY server-side.
nuts-auth RS256 JWT + ahp_ API tokens. organize is always free. retrieve requires a token. Unset JWKS URL = open dev mode.
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Status, model info, live counts |
| POST | /sessions/:id/ingest | Chunk + embed text into session |
| GET | /sessions/:id/search?q=... | Semantic search with optional decay |
| GET | /sessions/:id | Session metadata |
| DELETE | /sessions/:id | Delete session |
| GET | /temp | List temp stores with TTL |
| POST | /temp/:name/ingest | Ingest into temp store (2 hr TTL) |
| GET | /temp/:name/search?q=... | Search temp store |
| DELETE | /temp/:name | Delete temp store |
| POST | /agent/:id/register | Register per-agent orthogonal key |
| POST | /agent/:id/encrypt | Encrypt embeddings |
| POST | /agent/:id/decrypt | Decrypt embeddings |
| POST | /invert | Reconstruct text from embedding vector |
# Ingest
curl -X POST https://shivvr.nuts.services/sessions/my-session/ingest \
-H "Content-Type: application/json" \
-d '{"text": "The harbor was quiet at dawn. Only the sound of halyards against aluminum masts.", "source": "journal"}'
# Search
curl "https://shivvr.nuts.services/sessions/my-session/search?q=morning+at+the+marina&n=5"
# Search with temporal decay (30% recency, 24h half-life)
curl "https://shivvr.nuts.services/sessions/my-session/search?q=marina&time_weight=0.3&decay_halflife_hours=24"
# Retrieve role with your own OpenAI key (no server key needed)
curl -X POST https://shivvr.nuts.services/sessions/my-session/ingest \
-H "Content-Type: application/json" \
-d '{"text": "Dense passage for retrieval.", "openai_api_key": "sk-..."}'
curl "https://shivvr.nuts.services/sessions/my-session/search?q=passage&role=retrieve&openai_api_key=sk-..."
# Temp store (expires in 2h)
curl -X POST https://shivvr.nuts.services/temp/scratch/ingest \
-H "Content-Type: application/json" \
-d '{"text": "Working notes for this agent session."}'
| Param | Default | Description |
|---|---|---|
q | required | Query text |
n | 5 | Number of results |
role | organize | organize (768d local) or retrieve (1536d OpenAI) |
time_weight | 0.0 | Blend semantic + recency score (0–1) |
decay_halflife_hours | 168 | Recency decay half-life in hours |
include_nearby | false | Return temporally adjacent chunks |
agent_id | — | Agent ID for encrypted search |
openai_api_key | — | Per-request OpenAI key for retrieve role (overrides server key) |
| Variable | Default | Description |
|---|---|---|
PORT | 8080 | Listen port |
MODEL_PATH | models/gtr-t5-base.onnx | GTR-T5-base ONNX embedder |
TOKENIZER_PATH | models/tokenizer.json | Tokenizer |
OPENAI_API_KEY | — | Enables text-embedding-ada-002 retrieve role |
OPENAI_EMBEDDING_MODEL | text-embedding-ada-002 | Override OpenAI model |
NUTS_AUTH_JWKS_URL | — | Enable auth (open dev mode if unset) |
NUTS_AUTH_VALIDATE_URL | https://auth.nuts.services/api/validate | API token validation endpoint |
| Layer | Choice |
|---|---|
| Runtime | Rust + Tokio + axum |
| Embedding | GTR-T5-base (768d) via ONNX Runtime 2.0 — local, required |
| Retrieve embedding | text-embedding-ada-002 via OpenAI API — optional |
| Storage | Ephemeral RwLock<HashMap> — no disk, no volume mounts |
| GPU | CUDA 12.6 via ort EP on Cloud Run L4 — CPU fallback automatic |
| Auth | nuts-auth RS256 JWT + ahp_ API tokens — optional |
| Inversion | vec2text gtr-base (projection + T5 enc/dec) — optional |