Running Llama 3.2 3B on a budget Android phone

Test run on a Redmi Note 11 (6GB RAM, Snapdragon 680). Llama 3.2 3B Q4_K_M loads in ~8 seconds and generates at about 8 tokens/second. Perfectly usable for chat! Using the Local AI Hub Android app. Here are my benchmark results.

Used in this post

🤖 Llama 3.2 3B Instruct ✨ Universal Assistant System Prompt

low-ram 3b

💬 1 Comments

Comments (1)

admin Apr 2, 2026

Thank you for the Redmi benchmark! Can you try with Q5_K_M and compare the quality?