Abstract: Considering privacy concerns and real-time demands of popular large language models (LLMs), a shift towards edge-based LLM inference leverages edge clusters in proximity to provide low ...