Free Local LLM in Docker: Build a Customer Feedback Analyser with Ollama and Pydantic

25 February, 2026 AI

A few months ago I was building a side project and added a small feature that called the OpenAI API. Nothing fancy — just summarising some text. I shipped it, forgot about it, and two weeks later opened my billing dashboard to find a £47 charge. For a side project. That nobody uses.

That was the moment I started seriously looking at local models. What I found was that running a capable LLM on your own machine has become genuinely easy. This article walks through the setup and then builds something real with it: a CLI tool that reads customer feedback from a CSV file, clusters it by theme, and generates a structured report.

I dockerise everything — side projects, tools, experiments. It keeps the environment reproducible, avoids "works on my machine" problems, and makes it trivial to hand something off or revisit it six months later. So naturally, the LLM runs in Docker too, with a proper typed response contract.

The Stack

  • Docker + Docker Compose — Ollama runs as a service, model pulled on first start
  • ollama/ollama — official Docker image, no local install required
  • Python 3.14 + Pydantic v2 — defines the response schema as a Python type
  • gemma3:4b — Google's model, 4 billion parameters, runs on 3 GB RAM

No GPU required for the 3B model. A mid-range laptop handles it fine. For heavier models (8B+) you will want 16 GB RAM or a discrete GPU.

Can You Actually Dockerise the Model?

Yes. The ollama/ollama image stores models in /root/.ollama. Mount that as a named Docker volume and the model persists across restarts. A separate init service pulls the model on first run. On all subsequent runs it exits in under a second — the model is already cached.

Model Cheat Sheet

Model RAM Speed Notes
llama3.2:3b 2 GB very fast solid baseline, widely supported
gemma3:4b 3 GB very fast strong structured output, 128K context
phi4-mini 4 GB fast strong instruction following
qwen3:8b 6 GB fast excellent instruction following, 256K context
mistral-nemo:12b 8 GB medium solid mid-range
phi4:14b 10 GB medium near-frontier quality, fits in 16 GB RAM
gemma3:27b 18 GB slow high quality
llama3.3:70b 48 GB very slow near-frontier quality

The model landscape evolves quickly — check ollama.com/library for the current list. For this project the 3B or 4B model is sufficient — we are asking it to classify and summarise, not reason through complex problems. Reasoning models like Phi 4 Reasoning or DeepSeek-R1 would add chain-of-thought tokens that slow things down with no benefit for structured classification.

The Business Problem

Every product team has the same pain: user feedback piles up across app store reviews, support tickets, NPS surveys, and feedback forms. A product manager at a small company might have 300 new reviews per week. Reading them manually to find patterns takes hours. The patterns are usually the same five themes repeated in different words.

An LLM is well-suited for this: read a batch of reviews, identify recurring themes, assign sentiment, pull representative quotes. The output is a report that takes 30 seconds to generate instead of 3 hours to write manually.

Project Structure

feedback-analyser/
├── docker-compose.yml
├── Dockerfile
├── .env
├── analyser.py
├── requirements.txt
└── reviews.csv

Sample Data

Create reviews.csv. If your feedback data is in JSON format, the CSV/JSON converter converts it in one step.

id,text
1,"App crashes every time I try to export to PDF. Been like this for two weeks."
2,"Love the new dashboard design, much cleaner than before"
3,"Why does it take 8 seconds to load my projects? Used to be instant."
4,"Customer support got back to me in 5 minutes, genuinely impressed"
5,"The mobile app is almost unusable, buttons are too small to tap accurately"
6,"Export to PDF still broken. No response from support after 3 days."
7,"Search doesn't find things that definitely exist. Had to scroll manually."
8,"Really enjoying the new collaboration features, team onboarding was smooth"
9,"Pricing went up 40% with no warning. Looking at alternatives."
10,"The keyboard shortcuts are a game changer, saving me so much time"
11,"Dark mode looks great but it doesn't persist between sessions"
12,"Would love a Zapier integration, we have so many manual steps because of this"
13,"Three data exports failed silently this week. No error message, just nothing."
14,"Onboarding flow is confusing, took me 20 minutes to figure out how to invite a team member"
15,"Performance has tanked since the last update. Everything feels sluggish."

Docker Compose

services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    healthcheck:
      test: ["CMD", "ollama", "list"]
      interval: 10s
      timeout: 5s
      retries: 10
  model-pull:
    image: ollama/ollama
    depends_on:
      ollama:
        condition: service_healthy
    environment:
      OLLAMA_HOST: http://ollama:11434
    entrypoint: ["ollama", "pull", "${OLLAMA_MODEL:-gemma3:4b}"]
    restart: "no"
  analyser:
    build: .
    profiles: ["run"]
    depends_on:
      model-pull:
        condition: service_completed_successfully
    environment:
      OLLAMA_HOST: http://ollama:11434
      OLLAMA_MODEL: ${OLLAMA_MODEL:-gemma3:4b}
    volumes:
      - ./:/app

volumes:
  ollama-data:

Three services:

  1. ollama — the server. Stores models in a named volume. A healthcheck ensures the API is ready before anything else proceeds.
  2. model-pull — an init container. Acts as an Ollama client, connects to the server, and issues ollama pull. Runs once, exits with code 0. The model lands in the shared volume.
  3. analyser — the Python app, assigned to the run profile. docker compose up will not start it automatically — it only runs when explicitly invoked.

The model name is read from .env, which Docker Compose picks up automatically:

# .env
OLLAMA_MODEL=gemma3:4b

To switch models, change this one line. Both model-pull and analyser will use the updated value.

Dockerfile

FROM python:3.14-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY analyser.py .

ENTRYPOINT ["python", "analyser.py"]
# requirements.txt
ollama>=0.6.1
pydantic>=2.12

Defining the Response Contract with Pydantic

A naive approach is to embed the expected JSON structure as a string inside the prompt. The model can follow it or not, and you only find out at runtime when json.loads() raises an exception.

The better approach: define a Pydantic model and pass its JSON Schema directly to Ollama. The same object that validates the response also defines it — single source of truth.

from typing import Literal

from pydantic import BaseModel, Field


class Theme(BaseModel):
    name: str
    count: int
    sentiment: Literal["negative", "positive", "mixed"]
    summary: str = Field(description="One sentence describing the theme")
    quotes: list[str] = Field(max_length=2)


class FeedbackReport(BaseModel):
    themes: list[Theme] = Field(max_length=5)
    most_urgent_issue: str
    strongest_praise: str
    overall_sentiment: Literal["negative", "positive", "mixed"]

Literal["negative", "positive", "mixed"] tells both the schema validator and the LLM exactly which values are legal. max_length=2 on quotes prevents the model padding the response with unnecessary examples.

System vs User Messages

Putting everything — role definition, task instructions, and raw data — into a single prompt string works, but conflates two different things. The model's persona and output contract are stable across every run. The reviews change. Splitting them is cleaner, and it makes the system prompt eligible for prefix caching (Ollama uses the same caching mechanism as the Claude and OpenAI APIs — a stable system prompt costs a fraction of a full re-read on subsequent calls). For a deeper look at writing effective system prompts and their token cost implications, see the Custom Instructions tool.

SYSTEM_PROMPT = """You are a product analyst specialising in customer feedback.
Identify recurring themes, assign sentiment, and extract representative quotes.
Respond only with JSON matching the provided schema."""


def build_user_message(reviews: list[str]) -> str:
    numbered = "\n".join(f"{i + 1}. {r}" for i, r in enumerate(reviews))
    return f"Analyse these {len(reviews)} customer reviews:\n\n{numbered}"

Passing the Schema as format

format="json" tells the model to produce valid JSON in any shape. format=FeedbackReport.model_json_schema() tells it to produce JSON that matches a specific schema.

Ollama 0.5+ enforces this via constrained decoding — the sampling process is restricted to tokens that keep the output valid at every step. The model cannot produce a sentiment value outside the Literal, cannot omit a required field, and cannot return count as a string.

def analyse(reviews: list[str]) -> FeedbackReport:
    response = client.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": build_user_message(reviews)},
        ],
        format=FeedbackReport.model_json_schema(),
    )
    return FeedbackReport.model_validate_json(response.message.content)

model_validate_json() raises a ValidationError with a precise field path if anything is wrong — far more useful than a bare json.loads() exception. To inspect the schema Pydantic generates, run print(FeedbackReport.model_json_schema()) and paste the output into the JSON formatter to read it comfortably.

Full Script

# analyser.py
import csv
import os
import sys
import textwrap
from pathlib import Path
from typing import Literal

import ollama
from pydantic import BaseModel, Field


MODEL = os.getenv("OLLAMA_MODEL", "gemma3:4b")

client = ollama.Client(host=os.getenv("OLLAMA_HOST", "http://localhost:11434"))

SYSTEM_PROMPT = """You are a product analyst specialising in customer feedback.
Identify recurring themes, assign sentiment, and extract representative quotes.
Respond only with JSON matching the provided schema."""


class Theme(BaseModel):
    name: str
    count: int
    sentiment: Literal["negative", "positive", "mixed"]
    summary: str = Field(description="One sentence describing the theme")
    quotes: list[str] = Field(max_length=2)


class FeedbackReport(BaseModel):
    themes: list[Theme] = Field(max_length=5)
    most_urgent_issue: str
    strongest_praise: str
    overall_sentiment: Literal["negative", "positive", "mixed"]


def load_reviews(path: str) -> list[str]:
    with open(path, newline="", encoding="utf-8") as f:
        return [row["text"] for row in csv.DictReader(f) if row["text"].strip()]
    return None


def build_user_message(reviews: list[str]) -> str:
    numbered = "\n".join(f"{i + 1}. {r}" for i, r in enumerate(reviews))
    return f"Analyse these {len(reviews)} customer reviews:\n\n{numbered}"


def analyse(reviews: list[str]) -> FeedbackReport:
    response = client.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": build_user_message(reviews)},
        ],
        format=FeedbackReport.model_json_schema(),
    )
    return FeedbackReport.model_validate_json(response.message.content)


def print_report(report: FeedbackReport, total: int) -> None:
    sentiment_icon = {"positive": "+", "negative": "x", "mixed": "~"}

    print(f"\n{'=' * 60}")
    print(f"  FEEDBACK ANALYSIS REPORT ({total} reviews)")
    print(f"  Overall: {report.overall_sentiment.upper()}")
    print(f"{'=' * 60}\n")

    for theme in report.themes:
        icon = sentiment_icon.get(theme.sentiment, "?")
        print(f"[{icon}] {theme.name} ({theme.count} reviews)")
        print(f"    {theme.summary}")
        for quote in theme.quotes:
            wrapped = textwrap.fill(
                f'"{quote}"',
                width=56,
                initial_indent="    > ",
                subsequent_indent="      ",
            )
            print(wrapped)
        print()

    print(f"{'-' * 60}")
    print(f"  Most urgent: {report.most_urgent_issue}")
    print(f"  Top praise: {report.strongest_praise}")
    print(f"{'-' * 60}\n")


def main() -> None:
    path = sys.argv[1] if len(sys.argv) > 1 else "reviews.csv"

    if not Path(path).exists():
        print(f"File not found: {path}")
        sys.exit(1)

    reviews = load_reviews(path)
    print(f"Analysing {len(reviews)} reviews with {MODEL}...")

    report = analyse(reviews)
    print_report(report, len(reviews))


if __name__ == "__main__":
    main()

The full project — including docker-compose.yml, Dockerfile, and sample data — is on GitHub.

Running It

# Start the Ollama server and pull the model (runs in the background)
docker compose up -d

# Run the analyser when you are ready
docker compose run --rm analyser reviews.csv

Output looks like this:

➜  docker compose run --rm analyser reviews.csv
 ✔ Container feedback-analyser-ollama-1     Healthy     0.5s
 ✔ Container feedback-analyser-model-pull-1 Started     0.1s
Container feedback-analyser-model-pull-1 Waiting
Container feedback-analyser-model-pull-1 Exited
Container feedback-analyser-analyser-run-ab2db35729ca Creating
Container feedback-analyser-analyser-run-ab2db35729ca Created
Analysing 15 reviews with gemma3:4b...

============================================================
  FEEDBACK ANALYSIS REPORT (15 reviews)
  Overall: NEGATIVE
============================================================

[x] Performance Issues (6 reviews)
    Customers are experiencing slow loading times, crashes, and sluggish performance.
    > "App crashes every time I try to export to PDF."
    > "Why does it take 8 seconds to load my projects?
      Used to be instant. Performance has tanked since
      the last update. Everything feels sluggish. Three
      data exports failed silently this week. No error
      message, just nothing."

[x] Export Functionality (2 reviews)
    Users are encountering problems with exporting to PDF.
    > "App crashes every time I try to export to PDF."
    > "Export to PDF still broken. No response from
      support after 3 days."

[~] UI/UX Issues (4 reviews)
    Customers have concerns about the user interface, including small buttons, confusing onboarding, and persistent dark mode.
    > "The mobile app is almost unusable, buttons are
      too small to tap accurately."
    > "Onboarding flow is confusing, took me 20 minutes
      to figure out how to invite a team member. Dark
      mode looks great but it doesn't persist between
      sessions."

[x] Search Functionality (1 reviews)
    The search feature is inaccurate and inefficient.
    > "Search doesn't find things that definitely exist.
      Had to scroll manually."

[+] Customer Support (1 reviews)
    Positive experience with customer support responsiveness.
    > "Customer support got back to me in 5 minutes,
      genuinely impressed"

------------------------------------------------------------
  Most urgent: Export to PDF functionality is consistently failing, compounded by lack of support response.
  Top praise: New dashboard design and collaboration features are highly appreciated.
------------------------------------------------------------

In about 25 seconds on a laptop, without a single API call. The Ollama server log shows the exact time spent on inference:

ollama-1 | [GIN] 2026/02/24 - 21:07:28 | 200 | 22.281823492s | 172.22.0.3 | POST "/api/chat"

That is a single HTTP request — 15 reviews in, structured report out. On CPU without a GPU, 20-30 seconds is typical for a 4B model. With a GPU it drops to 1-3 seconds.

Taking It Further

GPU support. Add deploy.resources to the ollama service:

ollama:
  image: ollama/ollama
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Use ollama/ollama:rocm for AMD GPUs.

Batch processing. Run it weekly on a fresh CSV export from your app store, Intercom, or Zendesk. Pipe the output to Slack with a webhook. One cron job, zero ongoing cost.

Streaming output. For large review sets, use stream=True to print tokens as they arrive instead of waiting for the full response. Note that streaming and structured JSON schema output are mutually exclusive in current Ollama versions — use streaming when you drop the schema constraint.

Embeddings for deduplication. Before sending reviews to the LLM, use ollama.embeddings() to compute vector similarity and deduplicate near-identical reviews. This reduces token count and improves theme quality.

Configurable model. Pass MODEL_NAME as an environment variable. Run gemma3:4b during development and phi4:14b in production for higher accuracy at no additional cost.

When Not to Use a Local Model

Local LLMs are not a universal replacement for cloud APIs. A 3B model makes mistakes that GPT-4 doesn't. For a weekly internal report, that's fine. For customer-facing responses or anything where accuracy is critical, pay for the API.

The sweet spot for local models: batch processing jobs, internal tooling, development and testing, and anything where you would otherwise skip the feature because of cost. For this feedback analyser, even if the model misclassifies a review or two, the output is still far better than not reading the reviews at all.

The full script is 80 lines. The setup is one command. The monthly cost is zero.

More Articles

UUID Versions Explained: v1, v3, v4, v5, v6, and v7

A complete technical breakdown of all UUID versions. Covers time-based, name-based, and random UUIDs, with code examples in PHP, Python, and JavaScript, and a practical guide to choosing the right version.

28 February, 2026

Password Security and Entropy: Why Length Beats Complexity

A technical guide to password entropy for developers. Covers entropy calculation, character sets, passphrases vs random strings, brute force and rainbow table attacks, and secure password generation.

26 February, 2026

RAG Document Assistant: Answer Questions from Your Own Docs with Ollama, ChromaDB and Docker

Build a local RAG document assistant that reads .txt files, indexes them with vector embeddings, and answers questions using a local LLM — all without a cloud API. Includes a FastAPI backend, a minimal browser UI, and a full Docker Compose setup.

26 February, 2026