LLM on akeso
An OpenAI compatible API endpoint runs on akeso.fi.muni.cz, at the following URL:
https://nlp.fi.muni.cz/llama/
Currently deployed model names:
- gpt-oss-120b
- eurollm-9b-instruct-q6_k_l
- qwen3-30b-a3b-instruct-2507-q6_k_xl
- glm4.6-iq4_k
The URL provides an OpenAI compatible API endpoint. To access a specific model through a Web UI, go to https://nlp.fi.muni.cz/llama/upstream/MODEL_NAME/:
- https://nlp.fi.muni.cz/llama/upstream/gpt-oss-120b/
- https://nlp.fi.muni.cz/llama/upstream/eurollm-9b-instruct-q6_k_l/
- https://nlp.fi.muni.cz/llama/upstream/qwen3-30b-a3b-instruct-2507-q6_k_xl/
- https://nlp.fi.muni.cz/llama/upstream/glm4.6-iq4_k/
See the documentation of llama-server for API details: https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#openai-compatible-api-endpoints.
Authorization
Use HTTP Bearer Token authorization for access. In the following, test represents the token, replace it with your own access token.
To obtain the access token, ask you supervisor.
Web frontend
Provide the token on the Settings dialog:
HTTP
Send the following HTTP header along with your request:
Authorization: Bearer test
Examples
cURL
curl -X POST https://nlp.fi.muni.cz/llama/v1/chat/completions \
--compressed -H "Content-Type: application/json" \
-H "Authorization: Bearer test" \
-d '{ "model": "gpt-oss-120b", "messages": [{"role": "system", "content":"You are a helpful assistant."},{"role": "user", "content":"řekni vtip"}]}'
Python
Requests
import requests
payload = {'model': 'gpt-oss-120b',
'messages': [{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'řekni vtip'}]}
response = requests.post("https://nlp.fi.muni.cz/llama/v1/chat/completions",
headers={'Authorization': 'Bearer test'}, json=payload)
print(response.json()['choices'][0]['message']['content'])
OpenAI library
import openai
client = openai.OpenAI(base_url="https://nlp.fi.muni.cz/llama/", api_key="test")
response = client.chat.completions.create(
model = "gpt-oss-120b",
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'řekni vtip'}])
print(response.choices[0].message.content)
Notes
- You do not have to provide the system message – a default provided by the model will be used instead.
- Your data is not private – other users can see cached requests using the /slots endpoint.
- Send me (Ondřej Herman) a message at xherman1@fi.muni.cz should you need help or have any thoughts to share.







