Started working as a Researcher at UMass BioNLP

Collaborated with faculty and PhD students to fine‑tune LLAMA 3.1‑8B using PEFT, QLoRA techniques, incorporating Proximal Policy Optimization (PPO) and reward models to focus on question difficulty.