Free Local LLM in Docker: Build a Customer Feedback Analyser with Ollama and Pydantic
25 February, 2026 AI
A few months ago I was building a side project and added a small feature that called the OpenAI API. Nothing fancy — just summarising some text. I shipped it, forgot about it, and two weeks later opened my billing dashboard to find a £47 charge. For a side project. That nobody uses.
That was the moment I started seriously looking at local models. What I found was that running a capable LLM on your own machine has become genuinely easy. This article walks through the setup and then builds something real with it: a CLI tool that reads customer feedback from a CSV file, clusters it by theme, and generates a structured report.
I dockerise everything — side projects, tools, experiments. It keeps the environment reproducible, avoids "works on my machine" problems, and makes it trivial to hand something off or revisit it six months later. So naturally, the LLM runs in Docker too, with a proper typed response contract.
The Stack
- Docker + Docker Compose — Ollama runs as a service, model pulled on first start
ollama/ollama— official Docker image, no local install required- Python 3.14 + Pydantic v2 — defines the response schema as a Python type
gemma3:4b— Google's model, 4 billion parameters, runs on 3 GB RAM
No GPU required for the 3B model. A mid-range laptop handles it fine. For heavier models (8B+) you will want 16 GB RAM or a discrete GPU.
Can You Actually Dockerise the Model?
Yes. The ollama/ollama image stores models in /root/.ollama. Mount that as a named Docker volume and the model persists across restarts. A separate init service pulls the model on first run. On all subsequent runs it exits in under a second — the model is already cached.
Model Cheat Sheet
| Model | RAM | Speed | Notes |
|---|---|---|---|
llama3.2:3b |
2 GB | very fast | solid baseline, widely supported |
gemma3:4b |
3 GB | very fast | strong structured output, 128K context |
phi4-mini |
4 GB | fast | strong instruction following |
qwen3:8b |
6 GB | fast | excellent instruction following, 256K context |
mistral-nemo:12b |
8 GB | medium | solid mid-range |
phi4:14b |
10 GB | medium | near-frontier quality, fits in 16 GB RAM |
gemma3:27b |
18 GB | slow | high quality |
llama3.3:70b |
48 GB | very slow | near-frontier quality |
The model landscape evolves quickly — check ollama.com/library for the current list. For this project the 3B or 4B model is sufficient — we are asking it to classify and summarise, not reason through complex problems. Reasoning models like Phi 4 Reasoning or DeepSeek-R1 would add chain-of-thought tokens that slow things down with no benefit for structured classification.
The Business Problem
Every product team has the same pain: user feedback piles up across app store reviews, support tickets, NPS surveys, and feedback forms. A product manager at a small company might have 300 new reviews per week. Reading them manually to find patterns takes hours. The patterns are usually the same five themes repeated in different words.
An LLM is well-suited for this: read a batch of reviews, identify recurring themes, assign sentiment, pull representative quotes. The output is a report that takes 30 seconds to generate instead of 3 hours to write manually.
Project Structure
feedback-analyser/
├── docker-compose.yml
├── Dockerfile
├── .env
├── analyser.py
├── requirements.txt
└── reviews.csv
Sample Data
Create reviews.csv. If your feedback data is in JSON format, the CSV/JSON converter converts it in one step.
id,text
1,"App crashes every time I try to export to PDF. Been like this for two weeks."
2,"Love the new dashboard design, much cleaner than before"
3,"Why does it take 8 seconds to load my projects? Used to be instant."
4,"Customer support got back to me in 5 minutes, genuinely impressed"
5,"The mobile app is almost unusable, buttons are too small to tap accurately"
6,"Export to PDF still broken. No response from support after 3 days."
7,"Search doesn't find things that definitely exist. Had to scroll manually."
8,"Really enjoying the new collaboration features, team onboarding was smooth"
9,"Pricing went up 40% with no warning. Looking at alternatives."
10,"The keyboard shortcuts are a game changer, saving me so much time"
11,"Dark mode looks great but it doesn't persist between sessions"
12,"Would love a Zapier integration, we have so many manual steps because of this"
13,"Three data exports failed silently this week. No error message, just nothing."
14,"Onboarding flow is confusing, took me 20 minutes to figure out how to invite a team member"
15,"Performance has tanked since the last update. Everything feels sluggish."
Docker Compose
services:
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
healthcheck:
test: ["CMD", "ollama", "list"]
interval: 10s
timeout: 5s
retries: 10
model-pull:
image: ollama/ollama
depends_on:
ollama:
condition: service_healthy
environment:
OLLAMA_HOST: http://ollama:11434
entrypoint: ["ollama", "pull", "${OLLAMA_MODEL:-gemma3:4b}"]
restart: "no"
analyser:
build: .
profiles: ["run"]
depends_on:
model-pull:
condition: service_completed_successfully
environment:
OLLAMA_HOST: http://ollama:11434
OLLAMA_MODEL: ${OLLAMA_MODEL:-gemma3:4b}
volumes:
- ./:/app
volumes:
ollama-data:
Three services:
ollama— the server. Stores models in a named volume. A healthcheck ensures the API is ready before anything else proceeds.model-pull— an init container. Acts as an Ollama client, connects to the server, and issuesollama pull. Runs once, exits with code 0. The model lands in the shared volume.analyser— the Python app, assigned to therunprofile.docker compose upwill not start it automatically — it only runs when explicitly invoked.
The model name is read from .env, which Docker Compose picks up automatically:
# .env
OLLAMA_MODEL=gemma3:4b
To switch models, change this one line. Both model-pull and analyser will use the updated value.
Dockerfile
FROM python:3.14-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY analyser.py .
ENTRYPOINT ["python", "analyser.py"]
# requirements.txt
ollama>=0.6.1
pydantic>=2.12
Defining the Response Contract with Pydantic
A naive approach is to embed the expected JSON structure as a string inside the prompt. The model can follow it or not, and you only find out at runtime when json.loads() raises an exception.
The better approach: define a Pydantic model and pass its JSON Schema directly to Ollama. The same object that validates the response also defines it — single source of truth.
from typing import Literal
from pydantic import BaseModel, Field
class Theme(BaseModel):
name: str
count: int
sentiment: Literal["negative", "positive", "mixed"]
summary: str = Field(description="One sentence describing the theme")
quotes: list[str] = Field(max_length=2)
class FeedbackReport(BaseModel):
themes: list[Theme] = Field(max_length=5)
most_urgent_issue: str
strongest_praise: str
overall_sentiment: Literal["negative", "positive", "mixed"]
Literal["negative", "positive", "mixed"] tells both the schema validator and the LLM exactly which values are legal. max_length=2 on quotes prevents the model padding the response with unnecessary examples.
System vs User Messages
Putting everything — role definition, task instructions, and raw data — into a single prompt string works, but conflates two different things. The model's persona and output contract are stable across every run. The reviews change. Splitting them is cleaner, and it makes the system prompt eligible for prefix caching (Ollama uses the same caching mechanism as the Claude and OpenAI APIs — a stable system prompt costs a fraction of a full re-read on subsequent calls). For a deeper look at writing effective system prompts and their token cost implications, see the Custom Instructions tool.
SYSTEM_PROMPT = """You are a product analyst specialising in customer feedback.
Identify recurring themes, assign sentiment, and extract representative quotes.
Respond only with JSON matching the provided schema."""
def build_user_message(reviews: list[str]) -> str:
numbered = "\n".join(f"{i + 1}. {r}" for i, r in enumerate(reviews))
return f"Analyse these {len(reviews)} customer reviews:\n\n{numbered}"
Passing the Schema as format
format="json" tells the model to produce valid JSON in any shape. format=FeedbackReport.model_json_schema() tells it to produce JSON that matches a specific schema.
Ollama 0.5+ enforces this via constrained decoding — the sampling process is restricted to tokens that keep the output valid at every step. The model cannot produce a sentiment value outside the Literal, cannot omit a required field, and cannot return count as a string.
def analyse(reviews: list[str]) -> FeedbackReport:
response = client.chat(
model=MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": build_user_message(reviews)},
],
format=FeedbackReport.model_json_schema(),
)
return FeedbackReport.model_validate_json(response.message.content)
model_validate_json() raises a ValidationError with a precise field path if anything is wrong — far more useful than a bare json.loads() exception. To inspect the schema Pydantic generates, run print(FeedbackReport.model_json_schema()) and paste the output into the JSON formatter to read it comfortably.
Full Script
# analyser.py
import csv
import os
import sys
import textwrap
from pathlib import Path
from typing import Literal
import ollama
from pydantic import BaseModel, Field
MODEL = os.getenv("OLLAMA_MODEL", "gemma3:4b")
client = ollama.Client(host=os.getenv("OLLAMA_HOST", "http://localhost:11434"))
SYSTEM_PROMPT = """You are a product analyst specialising in customer feedback.
Identify recurring themes, assign sentiment, and extract representative quotes.
Respond only with JSON matching the provided schema."""
class Theme(BaseModel):
name: str
count: int
sentiment: Literal["negative", "positive", "mixed"]
summary: str = Field(description="One sentence describing the theme")
quotes: list[str] = Field(max_length=2)
class FeedbackReport(BaseModel):
themes: list[Theme] = Field(max_length=5)
most_urgent_issue: str
strongest_praise: str
overall_sentiment: Literal["negative", "positive", "mixed"]
def load_reviews(path: str) -> list[str]:
with open(path, newline="", encoding="utf-8") as f:
return [row["text"] for row in csv.DictReader(f) if row["text"].strip()]
return None
def build_user_message(reviews: list[str]) -> str:
numbered = "\n".join(f"{i + 1}. {r}" for i, r in enumerate(reviews))
return f"Analyse these {len(reviews)} customer reviews:\n\n{numbered}"
def analyse(reviews: list[str]) -> FeedbackReport:
response = client.chat(
model=MODEL,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": build_user_message(reviews)},
],
format=FeedbackReport.model_json_schema(),
)
return FeedbackReport.model_validate_json(response.message.content)
def print_report(report: FeedbackReport, total: int) -> None:
sentiment_icon = {"positive": "+", "negative": "x", "mixed": "~"}
print(f"\n{'=' * 60}")
print(f" FEEDBACK ANALYSIS REPORT ({total} reviews)")
print(f" Overall: {report.overall_sentiment.upper()}")
print(f"{'=' * 60}\n")
for theme in report.themes:
icon = sentiment_icon.get(theme.sentiment, "?")
print(f"[{icon}] {theme.name} ({theme.count} reviews)")
print(f" {theme.summary}")
for quote in theme.quotes:
wrapped = textwrap.fill(
f'"{quote}"',
width=56,
initial_indent=" > ",
subsequent_indent=" ",
)
print(wrapped)
print()
print(f"{'-' * 60}")
print(f" Most urgent: {report.most_urgent_issue}")
print(f" Top praise: {report.strongest_praise}")
print(f"{'-' * 60}\n")
def main() -> None:
path = sys.argv[1] if len(sys.argv) > 1 else "reviews.csv"
if not Path(path).exists():
print(f"File not found: {path}")
sys.exit(1)
reviews = load_reviews(path)
print(f"Analysing {len(reviews)} reviews with {MODEL}...")
report = analyse(reviews)
print_report(report, len(reviews))
if __name__ == "__main__":
main()
The full project — including docker-compose.yml, Dockerfile, and sample data — is on GitHub.
Running It
# Start the Ollama server and pull the model (runs in the background)
docker compose up -d
# Run the analyser when you are ready
docker compose run --rm analyser reviews.csv
Output looks like this:
➜ docker compose run --rm analyser reviews.csv
✔ Container feedback-analyser-ollama-1 Healthy 0.5s
✔ Container feedback-analyser-model-pull-1 Started 0.1s
Container feedback-analyser-model-pull-1 Waiting
Container feedback-analyser-model-pull-1 Exited
Container feedback-analyser-analyser-run-ab2db35729ca Creating
Container feedback-analyser-analyser-run-ab2db35729ca Created
Analysing 15 reviews with gemma3:4b...
============================================================
FEEDBACK ANALYSIS REPORT (15 reviews)
Overall: NEGATIVE
============================================================
[x] Performance Issues (6 reviews)
Customers are experiencing slow loading times, crashes, and sluggish performance.
> "App crashes every time I try to export to PDF."
> "Why does it take 8 seconds to load my projects?
Used to be instant. Performance has tanked since
the last update. Everything feels sluggish. Three
data exports failed silently this week. No error
message, just nothing."
[x] Export Functionality (2 reviews)
Users are encountering problems with exporting to PDF.
> "App crashes every time I try to export to PDF."
> "Export to PDF still broken. No response from
support after 3 days."
[~] UI/UX Issues (4 reviews)
Customers have concerns about the user interface, including small buttons, confusing onboarding, and persistent dark mode.
> "The mobile app is almost unusable, buttons are
too small to tap accurately."
> "Onboarding flow is confusing, took me 20 minutes
to figure out how to invite a team member. Dark
mode looks great but it doesn't persist between
sessions."
[x] Search Functionality (1 reviews)
The search feature is inaccurate and inefficient.
> "Search doesn't find things that definitely exist.
Had to scroll manually."
[+] Customer Support (1 reviews)
Positive experience with customer support responsiveness.
> "Customer support got back to me in 5 minutes,
genuinely impressed"
------------------------------------------------------------
Most urgent: Export to PDF functionality is consistently failing, compounded by lack of support response.
Top praise: New dashboard design and collaboration features are highly appreciated.
------------------------------------------------------------
In about 25 seconds on a laptop, without a single API call. The Ollama server log shows the exact time spent on inference:
ollama-1 | [GIN] 2026/02/24 - 21:07:28 | 200 | 22.281823492s | 172.22.0.3 | POST "/api/chat"
That is a single HTTP request — 15 reviews in, structured report out. On CPU without a GPU, 20-30 seconds is typical for a 4B model. With a GPU it drops to 1-3 seconds.
Taking It Further
GPU support. Add deploy.resources to the ollama service:
ollama:
image: ollama/ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Use ollama/ollama:rocm for AMD GPUs.
Batch processing. Run it weekly on a fresh CSV export from your app store, Intercom, or Zendesk. Pipe the output to Slack with a webhook. One cron job, zero ongoing cost.
Streaming output. For large review sets, use stream=True to print tokens as they arrive instead of waiting for the full response. Note that streaming and structured JSON schema output are mutually exclusive in current Ollama versions — use streaming when you drop the schema constraint.
Embeddings for deduplication. Before sending reviews to the LLM, use ollama.embeddings() to compute vector similarity and deduplicate near-identical reviews. This reduces token count and improves theme quality.
Configurable model. Pass MODEL_NAME as an environment variable. Run gemma3:4b during development and phi4:14b in production for higher accuracy at no additional cost.
When Not to Use a Local Model
Local LLMs are not a universal replacement for cloud APIs. A 3B model makes mistakes that GPT-4 doesn't. For a weekly internal report, that's fine. For customer-facing responses or anything where accuracy is critical, pay for the API.
The sweet spot for local models: batch processing jobs, internal tooling, development and testing, and anything where you would otherwise skip the feature because of cost. For this feedback analyser, even if the model misclassifies a review or two, the output is still far better than not reading the reviews at all.
The full script is 80 lines. The setup is one command. The monthly cost is zero.