Running Llama 3.2 3B on a budget Android phone
Test run on a Redmi Note 11 (6GB RAM, Snapdragon 680). Llama 3.2 3B Q4_K_M loads in ~8 seconds and generates at about 8 tokens/second. Perfectly usable for chat! Using the Local AI Hub Android app. Here are my benchmark results.
Used in this post
Comments (1)