Abstract:
This paper presents a privacy-focused teaching assistant system based on locally deployed Small Language Models (SLMs) combined with Retrieval Augmented Generation (RAG) for higher education environments. Existing AI-powered educational tools predominantly rely on cloud-based large language models, introducing challenges related to data privacy, regulatory compliance, and infrastructure cost. These limitations are particularly significant in university settings, where sensitive institutional data must remain protected. The proposed system is designed as a distributed, hardware-aware architecture operating entirely within institutional infrastructure. It supports deployment across diverse environments, including CPU-only systems, integrated GPUs, and discrete GPUs, enabling institutions to leverage existing hardware without requiring specialized AI infrastructure. The architecture integrates modular services for request routing, authentication, orchestration, and dynamic worker node management, alongside a vector database for semantic retrieval of educational content. Experimental observations demonstrate that retrieval mechanisms are essential for accessing institution specific knowledge, while also highlighting the importance of careful system design when integrating retrieval with small-scale models. The proposed approach provides a scalable, cost-efficient, and privacy-focused solution for deploying AI assistants in higher education.
CITATION:
IEEE format
. Bogićević, M. Milošević, “A Privacy Focused Distributed RAG Architecture Using Small Language Models for Higher Education,” in Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science, Belgrade, Singidunum University, Serbia, 2026, pp. 254-259. doi:10.15308/Sinteza-2026-254-259
APA format
Bogićević, ., Milošević, M. (2026). A Privacy Focused Distributed RAG Architecture Using Small Language Models for Higher Education. Paper presented at Sinteza 2026 - International Scientific Conference on Information Technology, Computer Science, and Data Science. doi:10.15308/Sinteza-2026-254-259