Efficient Policy Learning via Knowledge Distillation for Robotic Manipulation

No Thumbnail Available
Date
2025
Authors
Severhin, Oleksandr
Kuzmenko, Dmytro
Shvai, Nadiya
Journal Title
Journal ISSN
Volume Title
Publisher
Національний університет "Києво-Могилянська академія"
Abstract
The work focuses on the computational intractability of large-scale Reinforcement Learning (RL) models for robotic manipulation. While world-like models like TD-MPC2 demonstrate high performance in various manipulative tasks, their immense parameter count (e.g., 317M) hinders training and deployment on resource-constrained hardware. This research investigates Knowledge Distillation (KD) with a loss function specifically described in [1] and [2] as a primary method for model compression. This involves training a lightweight "student" model to mimic the behavior of a large, pre-trained "teacher" model. Unlike in supervised learning, distilling knowledge in RL is uniquely complex; the objective is to transfer a dynamic, reward-driven policy, not a simple input-output function.
Description
Keywords
model compression, Reinforcement Learning (RL), robotic manipulation, World-like models / TD-MPC2, conference materials
Citation
Severhin O. Efficient Policy Learning via Knowledge Distillation for Robotic Manipulation / Severhin O., Kuzmenko D., Shvai N. // Теоретичні та прикладні аспекти побудови програмних систем : праці 16 Міжнародної науково-практичної конференції, 23-24 листопада 2025 року, Київ / [за заг. ред. М. М. Глибовця, Т. В. Панченка та ін. ; Факультет інформатики Національного університету "Києво-Могилянська академія" та ін.]. - Київ : НаУКМА, 2025. - С. 64-66.