Audio-to-Text App Fine-Tuning
Cleaned Transcript
Built a simple app that takes audio notes and transcribes them using either the Whisper model or the Parakeet model. Then it uses the raw transcription and feeds it into a local LLM, Lama 3.2 with 3 billion parameters. Applied fine-tuned LoRa on top of that, low-order rank adapter, which means this model is very capable of cleaning up transcriptions and doing some processing on the note. Did this by creating synthetic dataset based on a handful of real transcripts, then generated new example transcripts, then used those with a golden state-of-the-art model using Shoots. I think I used Kimi K2, and then fine-tuned the model using Unslot. Took around four hours over 40,000 examples at batch size 16.
Summary
Fine-tuned a local LLM on audio transcription data using synthetic datasets and golden state-of-the-art models, achieving good cleaning capabilities.
Tags
Key Points
- Used Whisper or Parakeet for initial transcription
- Local LLM: Lama 3.2 with 3B parameters
- Fine-tuned LoRa adapter on top of LLM
- Created synthetic dataset from real transcripts
- Used Shoots/Kimi K2 golden model
- 40k examples at batch size 16, 4 hours total
Decisions
- Use LoRa adapter for cleaning transcriptions