I built a free medical evidence retrieval API

Every LLM agent that answers medical questions has the same problem: it either hallucinates confidently or refuses to answer. Both are useless. What you want is the middle: find relevant source material, show where it came from, let the caller decide.

HYFL RAG is a free retrieval API over Wikipedia's medical articles. Send it a query, get cited snippets back. No generated text, no summaries. Just the evidence.

Live at hyfl.uk.

Why Wikipedia

The corpus is Wikipedia's medical articles, heavily moderated and sourced. 33,000 articles, 188,000 chunks. For background context, not clinical guidance, it works.

How it works

You POST a query to /v1/retrieve with a Bearer token. The service runs full-text search, then vector reranks the candidates, and returns the top results with source URLs, headings, and scores.

No signup forms. No waiting for approval. You generate a free API key with a single POST:

curl -X POST https://hyfl.uk/v1/keys/anonymous \
  -H "Content-Type: application/json" \
  -d '{"name": "my-agent"}'

The response gives you a key with generous limits: 60 requests per minute, 1,000 per hour, 50,000 per day. Keys expire after 90 days. Mint a new one when needed.

What you get back

Each result includes the article title, section heading, a link to the Wikipedia source, a relevance score, and the actual text chunk. There's also a retrieval trace showing which query variant matched and where in the article the chunk came from.

{
  "title": "Systolic hypertension",
  "heading": "Systolic hypertension",
  "url": "https://en.wikipedia.org/wiki/Systolic_hypertension",
  "score": 0.7532,
  "text": "Systolic hypertension is defined as an elevated systolic blood pressure..."
}

Your LLM takes these chunks, writes the answer, and cites the sources. The API never generates text itself.

For agents specifically

The retrieve endpoint accepts conversation context: previous messages, topic summaries, salient terms. Multi-turn agent conversations can refine their retrieval without losing the thread.

The content request webhook

When an agent queries something that isn't in the corpus, it can submit a content request via /rag/v1/content-requests. These get stored and eventually trigger article ingestion. The corpus grows based on what people actually search for, not what I think they'll need.

What it is and isn't

It's a retrieval service. It returns evidence, not answers. If you ask "should I take ibuprofen with a stomach ulcer?", you'll get relevant chunks about NSAIDs, gastrointestinal side effects, and ulcer management. You, or your agent, still have to read them and decide.

The corpus is Wikipedia-derived. It's good background context. It is not a substitute for clinical guidelines, a doctor, or PubMed.

How retrieval actually works

Wikipedia articles get chunked into overlapping segments, around 188,000 for the medical corpus. Each chunk stores its source article, section heading, character offsets, and a text snippet.

When you send a query, it goes through two stages. First, full-text search against the chunk index to pull candidate results. Then, vector reranking: the query and candidates get embedded, scored by similarity, and reordered. The top_k highest-scoring chunks come back with their source URLs and relevance scores.

You can tune top_k (1 to 50) and max_context_chars (500 to 100,000) per request. The retrieval trace tells you which query variant was used and where in the article the chunk came from.

Stack

FastAPI on an Oracle server, behind a Cloudflare Tunnel. Docker Compose for deployment. SQLite for keys and the content request store. The whole pipeline runs in a single container.

The retrieval pipeline does FTS first to narrow the search space, then embeds both the query and candidates and scores them by cosine similarity. Chunks overlap so you don't lose context at boundaries. The trace metadata tells you exactly which query variant matched and what character offsets produced the hit.

MCP endpoint at /mcp speaks protocol version 2025-03-26. One tool: retrieve_evidence.

Try it

import requests

Get a key (once)
key = requests.post("https://hyfl.uk/v1/keys/anonymous",
    json={"name": "my-agent"}).json()["secret"]

Retrieve evidence
resp = requests.post("https://hyfl.uk/v1/retrieve",
    headers={"Authorization": f"Bearer {key}"},
    json={"query": "side effects of metformin", "top_k": 5})

for r in resp.json()["results"]:
    print(f"{r['title']}: {r['text'][:200]}...")
    print(f"  Source: {r['url']}")

Full API docs at hyfl.uk/llms-full.txt. OpenAPI spec at hyfl.uk/openapi.json.