LLM on akeso
An OpenAI compatible API endpoint runs on akeso.fi.muni.cz, at the following URL:
https://nlp.fi.muni.cz/llama/
Currently deployed model names:
- gpt-oss-120b
- eurollm-9b-instruct-q6_k_l
- qwen3-30b-a3b-instruct-2507-q6_k_xl
- glm4.6-iq4_k
The URL provides an OpenAI compatible API endpoint. To access a specific model through a Web UI, go to https://nlp.fi.muni.cz/llama/upstream/MODEL_NAME/:
- https://nlp.fi.muni.cz/llama/upstream/gpt-oss-120b/
- https://nlp.fi.muni.cz/llama/upstream/eurollm-9b-instruct-q6_k_l/
- https://nlp.fi.muni.cz/llama/upstream/qwen3-30b-a3b-instruct-2507-q6_k_xl/
- https://nlp.fi.muni.cz/llama/upstream/glm4.6-iq4_k/
See the documentation of llama-server for API details: https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#openai-compatible-api-endpoints.
Authorization
Use HTTP Bearer Token authorization for access. In the following, test
represents the token, replace it with your own access token.
To obtain the access token, ask you supervisor.
Web frontend
Provide the token on the Settings dialog:
HTTP
Send the following HTTP header along with your request:
Authorization: Bearer test
Examples
cURL
curl -X POST https://nlp.fi.muni.cz/llama/v1/chat/completions \ --compressed -H "Content-Type: application/json" \ -H "Authorization: Bearer test" \ -d '{ "model": "gpt-oss-120b", "messages": [{"role": "system", "content":"You are a helpful assistant."},{"role": "user", "content":"řekni vtip"}]}'
Python
Requests
import requests payload = {'model': 'gpt-oss-120b', 'messages': [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'řekni vtip'}]} response = requests.post("https://nlp.fi.muni.cz/llama/v1/chat/completions", headers={'Authorization': 'Bearer test'}, json=payload) print(response.json()['choices'][0]['message']['content'])
OpenAI library
import openai client = openai.OpenAI(base_url="https://nlp.fi.muni.cz/llama/", api_key="test") response = client.chat.completions.create( model = "gpt-oss-120b", messages = [ {'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'řekni vtip'}]) print(response.choices[0].message.content)
Notes
- You do not have to provide the system message – a default provided by the model will be used instead.
- The model parameter is currently ignored – only a single model is currently configured.
- Your data is not private – other users can see cached requests using the /slots endpoint.
- You can enable prefix caching to speed up processing of long repetitive prompts by sending "cache_prompt": true in the request.
- Send me (Ondřej Herman) a message at xherman1@fi.muni.cz should you need help.