Simple Frontend and Backend for Your Thesis on NLP Servers
This page describes how to deploy a web application on nlp.fi.muni.cz using the standard frontend–backend architecture used in NLP lab student projects.
Reference implementation: gitlab.fi.muni.cz/nlp/frontend_backend
Reference deployment: nlp.fi.muni.cz/projekty/frontend_backend
Overview
The web infrastructure at nlp.fi.muni.cz works like this:
- Athena is the web server. It serves static files (HTML, CSS, JS) and
CGI scripts from your project's
public_html/directory. Aurora is the home server. It stores/home/<login>directories. They cannot run long-lived processes.
- Apollo, Epimetheus*, … are compute servers with NVIDIA GPUs. You run your backend there.
- A CGI proxy script (Python or Bash) on Athena bridges the two: the browser talks to Athena, the CGI forwards the request to the backend and returns the JSON response.
- To start/stop the backend from the frontend a dedicated SSH key with a forced command is used — no password, no interactive shell.
Browser │ HTTP ▼ Athena (static HTML/CSS/JS + CGI) │ HTTP (api.cgi proxies API calls) │ SSH (start_backend.cgi / stop_backend.cgi trigger backend lifecycle) ▼ Apollo / Epimetheus* (FastAPI backend, optionally + Ollama)
Your project lives on the shared NLP disk at:
/nlp/projekty/<your_project>/
This path is accessible from all NLP servers (Athena, Apollo, Epimetheus*, …).
What is CGI?
CGI (Common Gateway Interface) is the oldest and simplest way to run server-side
code behind a web server. When Apache receives a request for a .cgi file, it
executes that file as a process and sends its stdout as the HTTP response.
Key rules:
- The script must print HTTP headers first, followed by a blank line,
then the body:
Content-Type: application/json; charset=utf-8 {"ok": true} - The shebang line (
#!/path/to/python) determines which interpreter runs it. Use the full absolute path to your venv Python. - The script must be executable:
chmod +x script.cgi - CGI environment variables carry request metadata:
REQUEST_METHOD,QUERY_STRING,CONTENT_LENGTH,CONTENT_TYPE, etc. - Error output goes to the Apache error log:
ssh athena tail -f /var/log/apache2/nlp_error.log
References: Wikipedia: CGI, Python cgi module
Project Structure
/nlp/projekty/<your_project>/
├── public_html/ ← served by Athena at https://nlp.fi.muni.cz/projekty/<your_project>/
│ ├── index.html ← frontend (static HTML)
│ ├── styles.css
│ ├── api.cgi ← main CGI proxy → backend
│ ├── start_backend.cgi ← SSH trigger: start backend
│ ├── stop_backend.cgi ← SSH trigger: stop backend
│ └── test.cgi ← debug CGI (shows env, echoes body)
│
├── project.conf ← single config: REMOTE_HOST, REMOTE_USER, PORT
├── backend_trigger ← SSH private key (NEVER put inside public_html!)
├── venv_cgi/ ← Python venv for CGI scripts (runs on Athena)
├── venv_be/ ← Python venv for backend (runs on compute server)
├── logs/ ← CGI and backend logs
│
└── backend/
├── fastapi_app/ ← Example 1: simple FastAPI app
│ └── app.py
└── ollama_example/ ← Example 2: FastAPI + Ollama LLM
└── app.py
Allowed Ports
Backend processes on the compute servers (Apollo, Epimetheus*, …) are reachable from Athena only on the port range 6000–6100.
Choose any free port in this range for your backend. Check which ports are already in use on your target server before starting:
ss -tlnp | grep ':6[01][0-9][0-9]\>' # or netstat -tlnp 2>/dev/null | grep ':6[01][0-9][0-9]\>'
Do not hard-code a port used by someone else's project. A safe convention is
to pick a port based on your login: if your UID ends in 42, try 6042.
Step-by-Step Setup
1. Clone repo and copy template files
cd /nlp/projekty/<your_project> git clone git@gitlab.fi.muni.cz:nlp/frontend_backend.git cp -r frontend_backend/public_html . cp -r frontend_backend/backend . cp -r frontend_backend/scripts . cp frontend_backend/project.conf . chmod +x public_html/*.cgi public_html/*.sh
2. Replace all placeholders
Every occurrence of:
YOUR_PROJECT→ your project directory name (e.g.my_project),YOUR_SERVER→ your chosen compute server (e.g.apollo.fi.muni.cz),YOUR_PORT→ your chosen port number in range 6000–6100, andYOUR_USERNAME→ your login (stored by system in$LOGNAME)
This also fills in project.conf — the single configuration file read by
all scripts and api.cgi at runtime. PORT is defined there exactly once, so
there is no risk of mismatch between the CGI proxy and the backend.
export YOUR_PROJECT=your_project_name
export YOUR_SERVER=apollo.fi.muni.cz
export YOUR_PORT=6080
grep -rl 'YOUR_\(PROJECT\|SERVER\|PORT\|USERNAME\)' \
public_html/ backend/ scripts/ project.conf | \
xargs sed -i -e "s|YOUR_PROJECT|$YOUR_PROJECT|g; s|YOUR_SERVER|$YOUR_SERVER|g" \
-e "s|YOUR_PORT|$YOUR_PORT|g; s|YOUR_USERNAME|$LOGNAME|g"
3. Create the logs directory
CGI scripts run under Apache's own system user, not your user account.
That user has no write access to your project directory, so the logs/
directory must exist and be world-writable before CGI first runs:
mkdir -p /nlp/projekty/$YOUR_PROJECT/logs chmod 1777 /nlp/projekty/$YOUR_PROJECT/logs
(1777 = writable by anyone, but each file can only be deleted by its owner —
same as /tmp.)
4. Create Python venvs
You need two separate virtual environments, and each must be created on the
correct server — Python versions differ (Athena: 3.12, Apollo: 3.10).
If you create venv_cgi on Apollo, Apache on Athena will run it with a
mismatched Python and fail to find the installed packages.
CGI venv — create on Athena:
ssh athena export YOUR_PROJECT=your_project_name cd /nlp/projekty/$YOUR_PROJECT python3 -m venv venv_cgi venv_cgi/bin/pip install requests
Backend venv — create on the compute server (Apollo / Epimetheus*):
ssh $YOUR_SERVER export YOUR_PROJECT=your_project_name cd /nlp/projekty/$YOUR_PROJECT python3 -m venv venv_be venv_be/bin/pip install fastapi "uvicorn[standard]" pydantic # or: pip install -r backend/fastapi_app/requirements.txt
5. Copy scripts to $HOME/bin
$HOME/bin/ is on the shared home disk — copy the scripts once
and they are available on all servers.
mkdir -p $HOME/bin cp scripts/*.sh $HOME/bin/ # find_free_gpu.sh only needed for Ollama variant chmod +x $HOME/bin/*.sh
After the sed in step 2, start_backend.sh already has PROJECT_DIR set and
reads PORT from project.conf at startup — no further editing needed for
the simple FastAPI variant.
6. Generate an SSH trigger key
This key lets Athena start/stop the backend on Apollo without a password.
Store it outside public_html/.
ssh-keygen -t ed25519 -C "athena FE -> backend trigger" \
-f /nlp/projekty/$YOUR_PROJECT/backend_trigger
# leave passphrase empty
chmod 600 /nlp/projekty/$YOUR_PROJECT/backend_trigger
chmod 644 /nlp/projekty/$YOUR_PROJECT/backend_trigger.pub
7. Add the key to authorized_keys
Your home directory $HOME is on a shared disk — it is the same
on all NLP servers (Athena, Aurora, Apollo, Epimetheus*, …). This means
$HOME/.ssh/authorized_keys is shared too: you only need to edit it
once and the key works from any server.
echo "command=\"$HOME/bin/backend_control.sh\",no-port-forwarding,no-agent-forwarding,no-pty,no-user-rc $(cat /nlp/projekty/$YOUR_PROJECT/backend_trigger.pub)" \
>> $HOME/.ssh/authorized_keys
The command= option means SSH will only run backend_control.sh, nothing
else, when this key is used. This is the security mechanism — even if an attacker
gets the private key, they can only trigger start/stop.
8. Test
# Check CGI environment curl https://nlp.fi.muni.cz/projekty/$YOUR_PROJECT/test.cgi # Start backend manually first, then test the proxy: ssh $YOUR_SERVER export YOUR_PROJECT=your_project_name export YOUR_PORT=6080 cd /nlp/projekty/$YOUR_PROJECT venv_be/bin/uvicorn backend.fastapi_app.app:app --host 0.0.0.0 --port $YOUR_PORT curl https://nlp.fi.muni.cz/projekty/$YOUR_PROJECT/api.cgi?action=health
Debugging
Apache CGI error log
ssh athena tail -f /var/log/apache2/nlp_error.log
CGI application log
Errors from api.cgi are written to:
/nlp/projekty/$YOUR_PROJECT/logs/cgi.log
Backend log
tail -f $HOME/logs/$YOUR_PROJECT/backend.log
SSH start/stop debugging
The Start/Stop buttons use SSH with a forced command. To test the SSH trigger manually (outside the browser):
ssh -i /nlp/projekty/$YOUR_PROJECT/backend_trigger \
-o BatchMode=yes \
$LOGNAME@$YOUR_SERVER start
Expected output: OK: backend started (PID=…)
If it fails:
Permission denied (publickey)— check that the key is in$HOME/.ssh/authorized_keyswith the correctcommand=prefixcommand not found/rc=255— check that$HOME/bin/backend_control.shexists and is executableConnectTimeout— check$YOUR_SERVERhostname
Check the start/stop logs:
tail -f /nlp/projekty/$YOUR_PROJECT/logs/start_backend.log tail -f /nlp/projekty/$YOUR_PROJECT/logs/stop_backend.log
Common problems
| Problem | Likely cause |
|---|---|
500 Internal Server Error from .cgi | Script not executable (chmod +x) or wrong shebang path
|
502 Bad Gateway from api.cgi | Backend not running, or wrong BACKEND_BASE in project.conf
|
ConnectionRefusedError on correct host | Backend not running yet, or wrong port — check project.conf matches on CGI and backend
|
PermissionError: logs/ | logs/ directory does not exist or wrong permissions — run mkdir + chmod 1777
|
ModuleNotFoundError: requests | venv_cgi was created on the wrong server (wrong Python version) — recreate on Athena
|
SSH trigger returns rc=255 | Wrong key path, key not in authorized_keys, or backend_control.sh not found
|
No free GPU available | All GPUs on Apollo are in use — wait or try Epimetheus* |
Note — kill_backend.sh kills by process name, not by PID.
kill_backend.sh uses pkill -u $USER -f uvicorn and pkill -u $USER -f ollama.
This stops all matching processes owned by your login on the server — not just the one this project started.
Consequences:
- If you run multiple projects under the same account on the same server, stopping one will stop the others too.
- If the backend is launched via a wrapper that changes the process name,
pkill -f uvicornmay not match it. start_backend.shlogs the PID (PID=…) but does not save it to a file.
If you need precise per-project stop logic, extend the scripts to use a PID file:
# in start_backend.sh, after the & : echo $! > "$LOG_DIR/backend.pid" # in kill_backend.sh, instead of pkill: kill "$(cat "$LOG_DIR/backend.pid")" && rm "$LOG_DIR/backend.pid"
Example: Adding a New API Endpoint
1. Add route in the backend (backend/fastapi_app/app.py):
@app.get("/status")
def status():
return {"status": "running", "version": "1.0"}
2. Add action in the CGI proxy (public_html/api.cgi):
elif action == "status":
r = requests.get(BACKEND_BASE + "/status", timeout=TIMEOUT)
r.raise_for_status()
return write_json({"ok": True, "data": r.json()})
3. Call from JavaScript:
const r = await fetch('api.cgi?action=status');
const data = await r.json();
console.log(data.data.status);
Ollama Extension (Advanced)
This section describes how to extend the basic setup with LLM inference via Ollama. No existing files need to be modified — everything is already prepared in the template and only needs to be activated.
See also: en/LLMInference
Why FE → CGI → BE → Ollama?
You might wonder why the browser cannot call Ollama directly. The reason is that
Ollama only listens on 127.0.0.1 (localhost) on the compute server — it is
intentionally not exposed to the network. Only processes running on the same
machine (i.e. the FastAPI backend) can reach it.
The full request chain is:
Browser │ POST api.cgi?action=generate (HTTPS, public internet) ▼ Athena — api.cgi │ POST http://$YOUR_SERVER:$YOUR_PORT/generate (internal network) ▼ FastAPI backend (Apollo) │ POST http://127.0.0.1:43444/api/generate (localhost only, NLP port) ▼ Ollama (Apollo)
What is already in the template
All the code is already written — you only need to activate it:
| File | Purpose |
|---|---|
backend/ollama_example/app.py | FastAPI app exposing /generate and /healthz (includes Ollama status)
|
public_html/api.cgi | action=generate already implemented — no changes needed
|
public_html/ollama.html | Frontend page: model selector, system prompt, prompt textarea, response output |
scripts/start_backend.sh | Variant B (FastAPI + Ollama + GPU selection) is already there, commented out |
Activation steps
1. Install httpx in the backend venv
ollama_example/app.py uses httpx to call Ollama. Install it on the compute
server:
ssh $YOUR_SERVER /nlp/projekty/$YOUR_PROJECT/venv_be/bin/pip install httpx
2. Check available models
Most models are already pulled on NLP servers — you usually do not need to pull anything. Check what is available first:
/mnt/local/disk2/ollama/bin/ollama list
If your model is missing, pull it (once). Note that pulling a large model takes several minutes:
/mnt/local/disk2/ollama/bin/ollama pull mistral-small3.2:24b
Common models available on NLP servers: mistral-small3.2:24b, qwen2.5vl:72b,
llama3.2-vision, granite3.2-vision:2b, and many more — see ollama list
for the full current list.
3. Switch start_backend.sh to Variant B
Open $HOME/bin/start_backend.sh and:
- Comment out the entire Variant A block (lines from
APP_MODULE=toexit 0).
- Uncomment the Variant B block (remove the leading
#from each line).
Variant B uses the following NLP-server-specific settings — verify they are correct before saving:
OLLAMA_PORT=43444 OLLAMA_SRV="/mnt/local/disk2/ollama/ollama_server.sh" APP_MODULE="backend.ollama_example.app:app"
Important notes:
- Ollama on NLP servers uses port 43444, not the default 11434.
- Ollama is not in PATH — it must be started via the wrapper script at
/mnt/local/disk2/ollama/ollama_server.sh. This path is on the local disk of each compute server (/mnt/local/), not on the shared NLP disk. - The wrapper takes two arguments:
<gpu_index>and<host>— Variant B passes these automatically.
Variant B will: find a free NVIDIA GPU → start Ollama via the wrapper → wait
5 seconds for initialisation → start the FastAPI backend with OLLAMA_HOST set.
4. Restart the backend
Use the Start backend button on index.html, or kill and restart manually:
$HOME/bin/kill_backend.sh $HOME/bin/start_backend.sh
The start may take up to 60 seconds — Ollama needs time to load.
5. Open ollama.html
Navigate to https://nlp.fi.muni.cz/projekty/$YOUR_PROJECT/ollama.html.
The page shows:
- Ollama status — green if both FastAPI and Ollama are reachable
- Model selector — choose from the available models
- System prompt — optional instruction for the model
- Prompt — your question or input
- Generate button — sends the request and displays the response
How the generate request flows
- Browser POSTs
{"prompt": "...", "model": "...", "system": "..."}toapi.cgi?action=generate api.cgiforwards the JSON body tohttp://$YOUR_SERVER:$YOUR_PORT/generate- FastAPI (
ollama_example/app.py) callshttp://127.0.0.1:43444/api/generatewith"stream": false— waits for the full response - The response travels back up the chain and is displayed in the browser
Reverting to Variant A
To go back to the simple FastAPI backend (no GPU, no Ollama), reverse step 3:
uncomment Variant A and comment out Variant B in
$HOME/bin/start_backend.sh.
Security Checklist
backend_trigger(private SSH key) — outsidepublic_html/,chmod 600project.conf— outsidepublic_html/(contains server/port info)- API tokens, passwords,
.envfiles — outsidepublic_html/ authorized_keysentry usescommand=+no-pty,no-port-forwarding,…- CGI never returns raw SSH stdout to the browser
- Backend validates all inputs before processing
Author
This template was created by Vítězslav Jíra (xjira@fi.muni.cz) as part of a bachelor's thesis project at the NLP Centre, Faculty of Informatics, Masaryk University (2025–2026).







