Čeština
English
  • Vítejte na stránkách NLP Centra!
  • Zapojte se do vývoje softwarových nástrojů!
  • Analýza přirozeného jazyka
  • Vyzkoušejte si korpusy o velikosti knihoven online!
  • Studujte jednu ze specializací!
  • Členové laboratoře

Simple Frontend and Backend for Your Thesis on NLP Servers

This page describes how to deploy a web application on nlp.fi.muni.cz using the standard frontend–backend architecture used in NLP lab student projects.

Reference implementation: gitlab.fi.muni.cz/nlp/frontend_backend

Reference deployment: nlp.fi.muni.cz/projekty/frontend_backend


Overview

The web infrastructure at nlp.fi.muni.cz works like this:

  • Athena is the web server. It serves static files (HTML, CSS, JS) and CGI scripts from your project's public_html/ directory. Aurora is the home server. It stores /home/<login> directories. They cannot run long-lived processes.
  • Apollo, Epimetheus*, … are compute servers with NVIDIA GPUs. You run your backend there.
  • A CGI proxy script (Python or Bash) on Athena bridges the two: the browser talks to Athena, the CGI forwards the request to the backend and returns the JSON response.
  • To start/stop the backend from the frontend a dedicated SSH key with a forced command is used — no password, no interactive shell.
Browser
  │  HTTP
  ▼
Athena  (static HTML/CSS/JS + CGI)
  │  HTTP  (api.cgi proxies API calls)
  │  SSH   (start_backend.cgi / stop_backend.cgi trigger backend lifecycle)
  ▼
Apollo / Epimetheus*  (FastAPI backend, optionally + Ollama)

Your project lives on the shared NLP disk at:

/nlp/projekty/<your_project>/

This path is accessible from all NLP servers (Athena, Apollo, Epimetheus*, …).


What is CGI?

CGI (Common Gateway Interface) is the oldest and simplest way to run server-side code behind a web server. When Apache receives a request for a .cgi file, it executes that file as a process and sends its stdout as the HTTP response.

Key rules:

  1. The script must print HTTP headers first, followed by a blank line, then the body:
    Content-Type: application/json; charset=utf-8
    
    {"ok": true}
    
  2. The shebang line (#!/path/to/python) determines which interpreter runs it. Use the full absolute path to your venv Python.
  3. The script must be executable: chmod +x script.cgi
  4. CGI environment variables carry request metadata: REQUEST_METHOD, QUERY_STRING, CONTENT_LENGTH, CONTENT_TYPE, etc.
  5. Error output goes to the Apache error log:
    ssh athena
    tail -f /var/log/apache2/nlp_error.log
    

References: Wikipedia: CGI, Python cgi module


Project Structure

/nlp/projekty/<your_project>/
├── public_html/            ← served by Athena at https://nlp.fi.muni.cz/projekty/<your_project>/
│   ├── index.html          ← frontend (static HTML)
│   ├── styles.css
│   ├── api.cgi             ← main CGI proxy → backend
│   ├── start_backend.cgi   ← SSH trigger: start backend
│   ├── stop_backend.cgi    ← SSH trigger: stop backend
│   └── test.cgi            ← debug CGI (shows env, echoes body)
│
├── project.conf            ← single config: REMOTE_HOST, REMOTE_USER, PORT
├── backend_trigger         ← SSH private key  (NEVER put inside public_html!)
├── venv_cgi/               ← Python venv for CGI scripts (runs on Athena)
├── venv_be/                ← Python venv for backend (runs on compute server)
├── logs/                   ← CGI and backend logs
│
└── backend/
    ├── fastapi_app/        ← Example 1: simple FastAPI app
    │   └── app.py
    └── ollama_example/     ← Example 2: FastAPI + Ollama LLM
        └── app.py

Allowed Ports

Backend processes on the compute servers (Apollo, Epimetheus*, …) are reachable from Athena only on the port range 6000–6100.

Choose any free port in this range for your backend. Check which ports are already in use on your target server before starting:

ss -tlnp | grep ':6[01][0-9][0-9]\>'
# or
netstat -tlnp 2>/dev/null | grep ':6[01][0-9][0-9]\>'

Do not hard-code a port used by someone else's project. A safe convention is to pick a port based on your login: if your UID ends in 42, try 6042.


Step-by-Step Setup

1. Clone repo and copy template files

cd /nlp/projekty/<your_project>
git clone git@gitlab.fi.muni.cz:nlp/frontend_backend.git
cp -r frontend_backend/public_html .
cp -r frontend_backend/backend .
cp -r frontend_backend/scripts .
cp frontend_backend/project.conf .
chmod +x public_html/*.cgi public_html/*.sh

2. Replace all placeholders

Every occurrence of:

  • YOUR_PROJECT → your project directory name (e.g. my_project),
  • YOUR_SERVER → your chosen compute server (e.g. apollo.fi.muni.cz),
  • YOUR_PORT → your chosen port number in range 6000–6100, and
  • YOUR_USERNAME → your login (stored by system in $LOGNAME)

This also fills in project.conf — the single configuration file read by all scripts and api.cgi at runtime. PORT is defined there exactly once, so there is no risk of mismatch between the CGI proxy and the backend.

export YOUR_PROJECT=your_project_name
export YOUR_SERVER=apollo.fi.muni.cz
export YOUR_PORT=6080
grep -rl 'YOUR_\(PROJECT\|SERVER\|PORT\|USERNAME\)' \
    public_html/ backend/ scripts/ project.conf | \
  xargs sed -i -e "s|YOUR_PROJECT|$YOUR_PROJECT|g; s|YOUR_SERVER|$YOUR_SERVER|g" \
      -e "s|YOUR_PORT|$YOUR_PORT|g; s|YOUR_USERNAME|$LOGNAME|g"

3. Create the logs directory

CGI scripts run under Apache's own system user, not your user account. That user has no write access to your project directory, so the logs/ directory must exist and be world-writable before CGI first runs:

mkdir -p /nlp/projekty/$YOUR_PROJECT/logs
chmod 1777 /nlp/projekty/$YOUR_PROJECT/logs

(1777 = writable by anyone, but each file can only be deleted by its owner — same as /tmp.)

4. Create Python venvs

You need two separate virtual environments, and each must be created on the correct server — Python versions differ (Athena: 3.12, Apollo: 3.10). If you create venv_cgi on Apollo, Apache on Athena will run it with a mismatched Python and fail to find the installed packages.

CGI venv — create on Athena:

ssh athena
export YOUR_PROJECT=your_project_name
cd /nlp/projekty/$YOUR_PROJECT
python3 -m venv venv_cgi
venv_cgi/bin/pip install requests

Backend venv — create on the compute server (Apollo / Epimetheus*):

ssh $YOUR_SERVER
export YOUR_PROJECT=your_project_name
cd /nlp/projekty/$YOUR_PROJECT
python3 -m venv venv_be
venv_be/bin/pip install fastapi "uvicorn[standard]" pydantic
# or: pip install -r backend/fastapi_app/requirements.txt

5. Copy scripts to $HOME/bin

$HOME/bin/ is on the shared home disk — copy the scripts once and they are available on all servers.

mkdir -p $HOME/bin
cp scripts/*.sh $HOME/bin/   # find_free_gpu.sh only needed for Ollama variant
chmod +x $HOME/bin/*.sh

After the sed in step 2, start_backend.sh already has PROJECT_DIR set and reads PORT from project.conf at startup — no further editing needed for the simple FastAPI variant.

6. Generate an SSH trigger key

This key lets Athena start/stop the backend on Apollo without a password. Store it outside public_html/.

ssh-keygen -t ed25519 -C "athena FE -> backend trigger" \
    -f /nlp/projekty/$YOUR_PROJECT/backend_trigger
# leave passphrase empty
chmod 600 /nlp/projekty/$YOUR_PROJECT/backend_trigger
chmod 644 /nlp/projekty/$YOUR_PROJECT/backend_trigger.pub

7. Add the key to authorized_keys

Your home directory $HOME is on a shared disk — it is the same on all NLP servers (Athena, Aurora, Apollo, Epimetheus*, …). This means $HOME/.ssh/authorized_keys is shared too: you only need to edit it once and the key works from any server.

echo "command=\"$HOME/bin/backend_control.sh\",no-port-forwarding,no-agent-forwarding,no-pty,no-user-rc $(cat /nlp/projekty/$YOUR_PROJECT/backend_trigger.pub)" \
    >> $HOME/.ssh/authorized_keys

The command= option means SSH will only run backend_control.sh, nothing else, when this key is used. This is the security mechanism — even if an attacker gets the private key, they can only trigger start/stop.

8. Test

# Check CGI environment
curl https://nlp.fi.muni.cz/projekty/$YOUR_PROJECT/test.cgi

# Start backend manually first, then test the proxy:
ssh $YOUR_SERVER
export YOUR_PROJECT=your_project_name
export YOUR_PORT=6080
cd /nlp/projekty/$YOUR_PROJECT
venv_be/bin/uvicorn backend.fastapi_app.app:app --host 0.0.0.0 --port $YOUR_PORT

curl https://nlp.fi.muni.cz/projekty/$YOUR_PROJECT/api.cgi?action=health

Debugging

Apache CGI error log

ssh athena
tail -f /var/log/apache2/nlp_error.log

CGI application log

Errors from api.cgi are written to:

/nlp/projekty/$YOUR_PROJECT/logs/cgi.log

Backend log

tail -f $HOME/logs/$YOUR_PROJECT/backend.log

SSH start/stop debugging

The Start/Stop buttons use SSH with a forced command. To test the SSH trigger manually (outside the browser):

ssh -i /nlp/projekty/$YOUR_PROJECT/backend_trigger \
    -o BatchMode=yes \
    $LOGNAME@$YOUR_SERVER start

Expected output: OK: backend started (PID=…)

If it fails:

  • Permission denied (publickey) — check that the key is in $HOME/.ssh/authorized_keys with the correct command= prefix
  • command not found / rc=255 — check that $HOME/bin/backend_control.sh exists and is executable
  • ConnectTimeout — check $YOUR_SERVER hostname

Check the start/stop logs:

tail -f /nlp/projekty/$YOUR_PROJECT/logs/start_backend.log
tail -f /nlp/projekty/$YOUR_PROJECT/logs/stop_backend.log

Common problems

Problem Likely cause
500 Internal Server Error from .cgi Script not executable (chmod +x) or wrong shebang path
502 Bad Gateway from api.cgi Backend not running, or wrong BACKEND_BASE in project.conf
ConnectionRefusedError on correct host Backend not running yet, or wrong port — check project.conf matches on CGI and backend
PermissionError: logs/ logs/ directory does not exist or wrong permissions — run mkdir + chmod 1777
ModuleNotFoundError: requests venv_cgi was created on the wrong server (wrong Python version) — recreate on Athena
SSH trigger returns rc=255 Wrong key path, key not in authorized_keys, or backend_control.sh not found
No free GPU available All GPUs on Apollo are in use — wait or try Epimetheus*

Note — kill_backend.sh kills by process name, not by PID.

kill_backend.sh uses pkill -u $USER -f uvicorn and pkill -u $USER -f ollama. This stops all matching processes owned by your login on the server — not just the one this project started.

Consequences:

  • If you run multiple projects under the same account on the same server, stopping one will stop the others too.
  • If the backend is launched via a wrapper that changes the process name, pkill -f uvicorn may not match it.
  • start_backend.sh logs the PID (PID=…) but does not save it to a file.

If you need precise per-project stop logic, extend the scripts to use a PID file:

# in start_backend.sh, after the & :
echo $! > "$LOG_DIR/backend.pid"

# in kill_backend.sh, instead of pkill:
kill "$(cat "$LOG_DIR/backend.pid")" && rm "$LOG_DIR/backend.pid"

Example: Adding a New API Endpoint

1. Add route in the backend (backend/fastapi_app/app.py):

@app.get("/status")
def status():
    return {"status": "running", "version": "1.0"}

2. Add action in the CGI proxy (public_html/api.cgi):

elif action == "status":
    r = requests.get(BACKEND_BASE + "/status", timeout=TIMEOUT)
    r.raise_for_status()
    return write_json({"ok": True, "data": r.json()})

3. Call from JavaScript:

const r    = await fetch('api.cgi?action=status');
const data = await r.json();
console.log(data.data.status);

Ollama Extension (Advanced)

This section describes how to extend the basic setup with LLM inference via Ollama. No existing files need to be modified — everything is already prepared in the template and only needs to be activated.

See also: en/LLMInference

Why FE → CGI → BE → Ollama?

You might wonder why the browser cannot call Ollama directly. The reason is that Ollama only listens on 127.0.0.1 (localhost) on the compute server — it is intentionally not exposed to the network. Only processes running on the same machine (i.e. the FastAPI backend) can reach it.

The full request chain is:

Browser
  │  POST api.cgi?action=generate   (HTTPS, public internet)
  ▼
Athena — api.cgi
  │  POST http://$YOUR_SERVER:$YOUR_PORT/generate   (internal network)
  ▼
FastAPI backend (Apollo)
  │  POST http://127.0.0.1:43444/api/generate   (localhost only, NLP port)
  ▼
Ollama (Apollo)

What is already in the template

All the code is already written — you only need to activate it:

File Purpose
backend/ollama_example/app.py FastAPI app exposing /generate and /healthz (includes Ollama status)
public_html/api.cgi action=generate already implemented — no changes needed
public_html/ollama.html Frontend page: model selector, system prompt, prompt textarea, response output
scripts/start_backend.sh Variant B (FastAPI + Ollama + GPU selection) is already there, commented out

Activation steps

1. Install httpx in the backend venv

ollama_example/app.py uses httpx to call Ollama. Install it on the compute server:

ssh $YOUR_SERVER
/nlp/projekty/$YOUR_PROJECT/venv_be/bin/pip install httpx

2. Check available models

Most models are already pulled on NLP servers — you usually do not need to pull anything. Check what is available first:

/mnt/local/disk2/ollama/bin/ollama list

If your model is missing, pull it (once). Note that pulling a large model takes several minutes:

/mnt/local/disk2/ollama/bin/ollama pull mistral-small3.2:24b

Common models available on NLP servers: mistral-small3.2:24b, qwen2.5vl:72b, llama3.2-vision, granite3.2-vision:2b, and many more — see ollama list for the full current list.

3. Switch start_backend.sh to Variant B

Open $HOME/bin/start_backend.sh and:

  1. Comment out the entire Variant A block (lines from APP_MODULE= to exit 0).
  1. Uncomment the Variant B block (remove the leading # from each line).

Variant B uses the following NLP-server-specific settings — verify they are correct before saving:

OLLAMA_PORT=43444
OLLAMA_SRV="/mnt/local/disk2/ollama/ollama_server.sh"
APP_MODULE="backend.ollama_example.app:app"

Important notes:

  • Ollama on NLP servers uses port 43444, not the default 11434.
  • Ollama is not in PATH — it must be started via the wrapper script at /mnt/local/disk2/ollama/ollama_server.sh. This path is on the local disk of each compute server (/mnt/local/), not on the shared NLP disk.
  • The wrapper takes two arguments: <gpu_index> and <host> — Variant B passes these automatically.

Variant B will: find a free NVIDIA GPU → start Ollama via the wrapper → wait 5 seconds for initialisation → start the FastAPI backend with OLLAMA_HOST set.

4. Restart the backend

Use the Start backend button on index.html, or kill and restart manually:

$HOME/bin/kill_backend.sh
$HOME/bin/start_backend.sh

The start may take up to 60 seconds — Ollama needs time to load.

5. Open ollama.html

Navigate to https://nlp.fi.muni.cz/projekty/$YOUR_PROJECT/ollama.html.

The page shows:

  • Ollama status — green if both FastAPI and Ollama are reachable
  • Model selector — choose from the available models
  • System prompt — optional instruction for the model
  • Prompt — your question or input
  • Generate button — sends the request and displays the response

How the generate request flows

  1. Browser POSTs {"prompt": "...", "model": "...", "system": "..."} to api.cgi?action=generate
  2. api.cgi forwards the JSON body to http://$YOUR_SERVER:$YOUR_PORT/generate
  3. FastAPI (ollama_example/app.py) calls http://127.0.0.1:43444/api/generate with "stream": false — waits for the full response
  4. The response travels back up the chain and is displayed in the browser

Reverting to Variant A

To go back to the simple FastAPI backend (no GPU, no Ollama), reverse step 3: uncomment Variant A and comment out Variant B in $HOME/bin/start_backend.sh.


Security Checklist

  • backend_trigger (private SSH key) — outside public_html/, chmod 600
  • project.confoutside public_html/ (contains server/port info)
  • API tokens, passwords, .env files — outside public_html/
  • authorized_keys entry uses command= + no-pty,no-port-forwarding,…
  • CGI never returns raw SSH stdout to the browser
  • Backend validates all inputs before processing

Author

This template was created by Vítězslav Jíra (xjira@fi.muni.cz) as part of a bachelor's thesis project at the NLP Centre, Faculty of Informatics, Masaryk University (2025–2026).