Skip to content

Concepts

Usage

GERD is primarly a tool for prototyping workflows for working with Large Language Models. It is meant to act as 'glue' between different tools and services and should ease the access to these tools.

In general, there should be only be two components involved in a GERD workflow: A configuration and a service. The configuration can be assembled from different sources and should be able to be used in different services. The foundation of such a configration is a YAML file. GERD provides a set of those which can be found in the config directory. Configurations in this directly can be accessed by name as shown in the following example. A simple configutation file may look like this:

model:
  name: "Qwen/Qwen2.5-0.5B-Instruct"
  temperature: 0.1
  top_p: 0.90
  max_new_tokens: 256
  repetition_penalty: 1.1
  prompt_setup:
    - [system, text: "You are a helpful assistant. Start your answers with 'Sure thing, buddy!'."]
  prompt_config:
    text: "Please say the word '{word}' three times."

And can be used with a ChatService via load_gen_config shown in the following example:

"""A 'hello world' example for GERD."""

import logging

from gerd.config import load_gen_config
from gerd.gen.chat_service import ChatService
from gerd.models.model import PromptConfig

logging.basicConfig(level=logging.WARNING)
logging.getLogger("gerd").setLevel(logging.DEBUG)

logging.info(
    "Loading chat service..."
    " When this is the first time you run this script, it will download the model."
    " This may take a few minutes."
)

chat = ChatService(load_gen_config("hello"))
res = chat.submit_user_message({"word": "teleportation"})
logging.info(res)

chat.set_prompt_config(PromptConfig.model_validate({"text": "{message}"}))
res = chat.submit_user_message({"message": "Hello! What is one plus one?"})
logging.info(res)

Configuration

Parameters in a configuration can be overwritten by environment variables. This is useful for secrets or other sensitive information or when using the same configuration in different environments. Environment variables may be provided in a .env file in the root directory of the project. You can find sample environment files in the project root directory. Their name starts with env. and is followed by the name of the use case.

GenerationConfig are prefixed with gerd_gen_, so to override the temperature of a model, the environment variable gerd_gen_model__temperature needs to be set. Similarly, QAConfig configurations are prefixed with gerd_qa_. The path for the used vector store can be set with gerd_qa_embedding__db_path. Note that nested configurations are separated by __. All environment variables are case insensitive, so you can use GERD_GEN_MODEL__TEMPERATURE as well to keep the common style of environment variables.

Services

Gerd can work with models using transformers or llama.cpp bindings via llama-cpp-python. As llama-cpp-python is an optional dependency, you need to specify llama-cpp when you install GERD.

# when installing from source
pip install .[llama-cpp]
# when using uv
uv sync --group llama-cpp

GERD choses whether to use transformers or llama-cpp based on the configuration. Usually, you need to specify a certain GGUF file when using quantized models which is the primary use for GERD and llama-cpp. Consequently, if model.file is set, GERD will assume that you want to use a quantized model and load llama-cpp bindings. If only a huggingface model handle (e.g. org/model_name) is provided, GERD will use the transformers library to load the model.

Furthermore, GERD supports the usage of llama.cpp servers as well as the OpenAI API. If you set an endpoint, the model handle and file parameters will be ignored and the model will be ignored. Make sure to set the right endpoint type because the provided capabilities differ.