
Category:
Category:
On-device LLMs
Category:
LLM Deployment
Definition
Models running directly on edge devices like laptops, phones, or IoT hardware.
Explanation
On-device LLMs avoid cloud dependency, reduce latency, increase privacy, and enable offline reasoning. They rely on model compression, quantization, and hardware acceleration (GPU/TPU/NPU). On-device agents can perform summarization, translation, local search, and personal assistant tasks without sending data to the cloud.
Technical Architecture
Input → On-device Model → Local Reasoning → Output
Core Component
Quantized model, device accelerator, local memory, offline tools
Use Cases
Mobile AI, private assistants, industrial IoT, field operations
Pitfalls
Limited compute; smaller models reduce accuracy
LLM Keywords
On Device LLM, Edge LLM, Mobile AI Models
Related Concepts
Related Frameworks
• Model Compression
• Edge AI
• Privacy
• Edge AI Inference Stack
