Improving Existing Indic LLM Finetuned Models

2 Techniques to Try Now

Ramsri Goutham


Improving Indic LLm Finetunes
Improving Indic LLm Finetunes

People ask me what they should attempt newly if they are training Indic LLMs since there are already a lot of Llama2/3 and Gemma based models.

Two things:

1. Train with DORA (not plain LORA)

Lora was decomposing additional trainable parameters into two low rank matrices (A&B).

DORA goes one step further and adds additional parameters: magnitude and direction vector. Now the direction vector is decomposed into two low rank trainable matrices (A&B).

So you have a LORA type configuration + additional magnitude vector trainable in DORA. But you get a superior training setup moving closer to full-finetuning with just magnitude vector training parameters as the overhead.


2. Train with ORPO

The current Indic models released have one big problem. Most of them are just SFT (instruction finetuned) models. So if you ask a question like “how to make a bomb” or “how to kidnap”, it doesn’t hesitate to provide an answer.

Doing DPO or RLHF is cumbersome given data constraints. ORPO eliminates the need by combining the SFT + DPO/RLHF step into one.

Find a good ORPO dataset, translate it into any specific Indic language (Hindi, Tamil, Telugu etc.) and finetune such that you combine the instruction and preference alignment task into one.


Feel free to follow me on Twitter where I share my learnings from building AI SaaS apps!

