Running AI in Your Pocket: On-Device LLM App with MediaPipe

Your phone today is basically a tiny AI workstation. It can run a full language model offline, with zero cloud, and no data leaving your device. That was the whole point of PocketLLM, an Android app I built that runs Google’s Gemma 3n model locally using MediaPipe Tasks GenAI.

No servers. No API keys (except initial HuggingFace model download). No monthly bills. Just pure on-device intelligence.

Why I Wanted an On-Device LLM

I’ve been experimenting with running AI locally for a while, first on my MacBook, then on Android. Some reasons why you could want this:

Prompts and code stay on your device.
It works offline (flights, remote areas, anywhere).
No rate limits, no API costs, no waiting on network calls.

When I learned Gemma 3n could be quantized to ~3GB and run on modern phones, it clicked:

“I can ship a full LLM inside an Android app.”

So I did. And it works.

Showcase

Open LLMs Make This Possible

Models like:

Meta Llama
Google Gemma
Mistral

…gave us something we didn’t have before: public weights that anyone can run.

Gemma 3n in particular is wild for its size. 3 billion parameters, quantized to ~3GB, runs smoothly on a flagship and “usable” on mid-range devices.

Real-World Use Cases

Here’s where on-device AI shines:

Students → Study help without sending homework to cloud servers
Developers → Code assistants that keep proprietary code local
Medical professionals → Draft notes without risking patient data
Lawyers → Generate documents without leaking client info
Remote/offline workers → AI that works on a plane or in a tunnel

Basically: if privacy matters, or internet sucks, on-device AI wins.

How PocketLLM Is Built

PocketLLM uses MediaPipe Tasks GenAI to load and run Gemma 3n on Android. Architecture-wise, it’s clean and simple:

Domain layer → business logic
Data layer → MediaPipe integration + storage
Presentation layer → Jetpack Compose UI

Add MediaPipe to Your Project

dependencies {
implementation("com.google.mediapipe:tasks-genai:0.10.27")
}

Requirements:

Android API 26+
ARM64 device
Enough RAM (4GB+ recommended)

For secure key/model storage, you’ll want:

implementation("androidx.security:security-crypto:1.1.0-alpha06")

Core MediaPipe Integration

This is where the magic happens, inside ChatRepositoryImpl.

Initialize the model

llmInference = LlmInference.createFromOptions(
context,
LlmInference.LlmInferenceOptions.builder()
.setModelPath(modelPath)
.setMaxTokens(currentSettings.maxTokens)
.build()
)

Generate responses

session.addQueryChunk(prompt)
val response = session.generateResponse()

Customizable Model Behavior

Users can tweak how the model responds through settings:

Temperature (0.0-1.0): Controls randomness.
- Lower = more focused,
- Higher = more creative
Top K & Top P: Fine-tune token sampling
Max Tokens: Control response length (128-2048)

Model Settings

These are passed when creating the chat session, giving users control over the model’s personality.

Gemma 3n uses ~3GB of memory.

Model Downloads

WorkManager handles:

downloading the model from HuggingFace
progress updates
error handling
retries

Credentials get stored with EncryptedSharedPreferences.

Performance

Cold start: ~2-4 seconds depending on phone. After that, responses are snappy.

Flagships: smooth
Mid-range: usable
Low-end: don’t bother, the model is too large.

What I Learned

On-device LLMs are 100% usable today.
MediaPipe GenAI has clean APIs and surprisingly good docs.
Gemma 3n performs well for its size.

The future is clear: smaller models + better quantization = AI that lives on your device, not on the cloud.

Next steps for the project might include:

streaming responses
conversation memory
multiple model support

…but honestly even v1 is already super functional.

Try It Yourself

Full source code is here: https://github.com/jafforgehq/pocketllm

If you care about privacy, offline AI, or you just want to build something cool with MediaPipe, give it a try. It’s clean, fully tested, and easy to extend.

This is the direction AI is moving, and it’s fun to be early.

Why I Wanted an On-Device LLM#

Open LLMs Make This Possible#

Real-World Use Cases#

How PocketLLM Is Built#

Add MediaPipe to Your Project#

Core MediaPipe Integration#

Initialize the model#

Generate responses#

Customizable Model Behavior#

Model Downloads#

Performance#

What I Learned#

Try It Yourself#