Improving Existing Indic LLM Finetuned Models
2 Techniques to Try Now
People ask me what they should attempt newly if they are training Indic LLMs since there are already a lot of Llama2/3 and Gemma based models.
Two things:
1. Train with DORA (not plain LORA)
Lora was decomposing additional trainable parameters into two low rank matrices (A&B).
DORA goes one step further and adds additional parameters: magnitude and direction vector. Now the direction vector is decomposed into two low rank trainable matrices (A&B).
So you have a LORA type configuration + additional magnitude vector trainable in DORA. But you get a superior training setup moving closer to full-finetuning with just magnitude vector training parameters as the overhead.
References:
2. Train with ORPO
The current Indic models released have one big problem. Most of them are just SFT (instruction finetuned) models. So if you ask a question like “how to make a bomb” or “how to kidnap”, it doesn’t hesitate to provide an answer.
Doing DPO or RLHF is cumbersome given data constraints. ORPO eliminates the need by combining the SFT + DPO/RLHF step into one.
Find a good ORPO dataset, translate it into any specific Indic language (Hindi, Tamil, Telugu etc.) and finetune such that you combine the instruction and preference alignment task into one.
References:
Feel free to follow me on Twitter where I share my learnings from building AI SaaS apps!