Case studyRAGHealthcareFastAPIReact

farmaconsult

farmaconsult is an AI consultant for medical drug information — built for i-Sys Labs at the “Digital Solutions in Medicine” case championship (Samara State Medical University). A clinician asks a question in plain language and gets an answer that is grounded strictly in the documents loaded into the knowledge base, with the source shown next to every claim.

The hard part of a medical assistant is not the chat — it is not making things up. The whole pipeline below exists to guarantee that: if there is no source for an answer, it says “no data” instead of guessing. Scroll — the live app follows along in the frame.

Sections7

IndustryMedicine

StackFastAPI + React

RetrievalRAG · vector + BM25

farmaconsult.ru/

01/07

Consultant

Ask in plain language, get a sourced answer

The doctor types a real question — how to take a drug, what the contraindications are, what evidence exists — and gets back prose with a citation marker after every statement. No wall of search results to read through.

Each answer carries an evidence base with the actual quotes it stood on, a confidence number, and a disclaimer that this is reference information. The point is a tool a clinician can trust at a glance, not a chatbot that sounds confident about nothing.

Under the hood

React + Vite frontend talking to a FastAPI backend. The answer object the UI renders is a contract: text with [n] markers, an evidence list of cited fragments, a confidence score, and a status (answered / nodata).

Knowledge base

How documents become a searchable base

Source documents — PDFs, DOCX, HTML, even scans — are read page by page, cut into ~800-character fragments along sentence boundaries, and turned into vectors. Scanned pages go through OCR first so nothing is lost.

Every fragment keeps its metadata: which source it came from, the source type, a URL, and start/end anchors so a quote can be highlighted right on the original page.

Under the hood

Ingestion: pypdf / python-docx / BeautifulSoup for parsing, pytesseract + pdf2image for OCR. Chunks (~800 chars, 120 overlap) are embedded with multilingual-e5-small and stored in ChromaDB on disk with full metadata, including text-fragment anchors.

Sources

The base grows on demand

You are not limited to whatever was pre-loaded. Type any drug or condition and the system finds the Wikipedia article, enriches it with the openFDA label and PubMed abstracts, indexes all of it, and adds it as a new topic.

So the knowledge base is not a fixed brochure — it expands to whatever the clinician actually needs to ask about.

Under the hood

Dynamic sources: a query triggers retrieval from Wikipedia + openFDA + PubMed, which is chunked, embedded, and written into ChromaDB live, tagged by source type so it can be balanced at retrieval time.

Retrieval

Two searches, fused into one ranking

Behind every answer are two searches running at once: a semantic one that understands meaning, and a keyword one that catches exact terms. Their results are merged so the strongest fragments rise to the top.

The system also guarantees the best fragment of each source type makes the shortlist, so FDA and PubMed evidence is not crowded out by longer Russian-language text.

Under the hood

Hybrid retrieval: vector (cosine over e5 embeddings) + BM25, merged via Reciprocal Rank Fusion (c=60) so incompatible score scales never need hand-tuning. 80 candidates per search → top-6 to the LLM, with a guaranteed best-per-source-type above a 0.35 floor.

Guardrails

It refuses rather than invents

If nothing in the base is close enough to the question, the system answers “no data” and never calls the model on an empty context. And if the model replies without citing a single source, that answer is rejected too.

In a medical setting that refusal is the feature: an honest “I don't have this” beats a confident, unsupported sentence.

Under the hood

Input guardrail: min_score (default 0.50) — below it, status is nodata and the LLM is never called. Output guardrail: a refusal marker or an answer with no [n] citation is downgraded to nodata. If the LLM is unreachable, an extractive fallback assembles the answer from top fragments.

Citations

Every claim points back to a quote

The answer is generated under a strict prompt: respond only from the supplied fragments, add nothing, and put a [n] link after each statement. Only fragments that were actually cited end up in the evidence base.

Each source comes with its quote, a relevance score, and a link — and where the source supports it, the quote is highlighted directly on the original page.

Under the hood

Generation runs against DeepSeek/OpenAI/Ollama with a citation-enforcing system prompt. Post-processing keeps only cited fragments, renumbers links sequentially, and attaches each quote with its source URL and text-fragment anchor.

Confidence

What the confidence number actually means

The percentage next to an answer is the cosine similarity of the single closest fragment in the base to the question — how well the base matched the query, not how sure the model is.

It is computed during retrieval, before the LLM runs, so it answers “did we find genuinely relevant text?” The UI colours it green / amber / red, but only min_score actually gates whether an answer is produced.

Under the hood

confidence = max cosine score among the selected top-6, rounded to two decimals — independent of the LLM. Front-end zones: ≥0.85 high, 0.65–0.85 medium, <0.65 low. A /stats endpoint aggregates per-session query counts, nodata rate, mean confidence, and feedback votes.