All-in-1 AI Tutor: Processing a Wikipedia Article Just Once

The dream of a personal AI tutor for every child is becoming a reality. A new system can take any Wikipedia article (e.g Aristotle) and instantly generate a rich, multi-faceted learning experience. This approach creates a dynamic educational toolkit from a single source.

The Vision

My approach now uses a single Large Language Model (LLM) with one lightweight adapter (LoRA) trained on a unified dataset. Two capabilities are routed by control tags embedded in the prompt:

Mode 1: Study Guide

Generates a compact study guide: summaries, key terms, short Q&A, and flashcards—all from the same source.

Mode 2: Concept Map & Timeline

Creates concept maps and timelines to visualize connections and sequences within the topic.

The user picks a Wikipedia article, and the system routes the prompt with tags to build a complete learning module. For example, after processing the Wikipedia article on “Anarchism,” the <CONCEPT_MAP_TIMELINE> mode produces a concept map connecting key figures to their core ideas and a timeline tracing the evolution of anarchist thought.

Diagram showing the process flow from a Wikipedia article to tag-routed study outputs.

The Technical Challenge

This leads to a critical performance question - how can we reduce the prompt processing at each step?

A Wikipedia article can be very long. The naive approach would be:

  1. Feed the full article + <STUDY_GUIDE> prompt to the model.
  2. Feed the full article + <CONCEPT_MAP_TIMELINE> prompt to the model.

Processing that much text multiple times is slow and computationally expensive, especially on CPU-only hardware. It is the primary bottleneck to making this system practical.

Can we reuse the KV cache across modes?

KV caches represent the model’s internal state for a specific prefix under a specific set of weights. Reusing a cache reliably requires that:

Because a LoRA changes the effective weights, computing a cache without the LoRA and then generating with the LoRA can be inconsistent. Likewise, swapping adapters between cache build and decode generally breaks cache validity. In plain eager transformers, starting a fresh generate() from an externally built past_key_values is also fragile/not well supported.

Things to explore:

I’m treating KV-cache reuse across modes as an optimization to evaluate, not a guarantee. The approaches above are practical today and can get close to the same user experience.

Bilawal.net

© 2025

𝕏 GitHub