Skip to content

gerd.models.model

Model configuration for supported model classes.

Classes:

Name Description
ChatMessage

Data structure for chat messages.

ModelConfig

Configuration for large language models.

ModelEndpoint

Configuration for model endpoints where models are hosted remotely.

PromptConfig

Configuration for prompts.

Attributes:

Name Type Description
ChatRole

Currently supported chat roles.

EndpointType

Endpoint for remote llm services.

ChatRole module-attribute

ChatRole = Literal['system', 'user', 'assistant']

Currently supported chat roles.

EndpointType module-attribute

EndpointType = Literal['llama.cpp', 'openai']

Endpoint for remote llm services.

ChatMessage

Bases: TypedDict

Data structure for chat messages.

Attributes:

Name Type Description
content str

The content of the chat message.

role ChatRole

The role or source of the chat message.

content instance-attribute

content: str

The content of the chat message.

role instance-attribute

role: ChatRole

The role or source of the chat message.

ModelConfig

Bases: BaseModel

Configuration for large language models.

Most llm libraries and/or services share common parameters for configuration. Explaining each parameter is out of scope for this documentation. The most essential parameters are explained for instance here. Default values have been chosen according to ctransformers library.

Attributes:

Name Type Description
batch_size int

The batch size for the generation.

context_length int

The context length for the model. Currently only LLaMA, MPT and Falcon

endpoint Optional[ModelEndpoint]

The endpoint of the model when hosted remotely.

extra_kwargs Optional[dict[str, Any]]

Additional keyword arguments for the model library.

file Optional[str]

The path to the model file. For local models only.

gpu_layers int

The number of layers to run on the GPU.

last_n_tokens int

The number of tokens to consider for the repetition penalty.

loras set[Path]

The list of additional LoRAs files to load.

max_new_tokens int

The maximum number of new tokens to generate.

name str

The name of the model. Can be a path to a local model or a huggingface handle.

prompt_config PromptConfig

The prompt configuration.

prompt_setup List[Tuple[Literal['system', 'user', 'assistant'], PromptConfig]]

A list of predefined prompts for the model.

repetition_penalty float

The repetition penalty.

seed int

The seed for the random number generator.

stop Optional[List[str]]

The stop tokens for the generation.

stream bool

Whether to stream the output.

temperature float

The temperature for the sampling.

threads Optional[int]

The number of threads to use for the generation.

top_k int

The number of tokens to consider for the top-k sampling.

top_p float

The cumulative probability for the top-p sampling.

torch_dtype Optional[str]

The torch data type for the model.

batch_size class-attribute instance-attribute

batch_size: int = 8

The batch size for the generation.

context_length class-attribute instance-attribute

context_length: int = 0

The context length for the model. Currently only LLaMA, MPT and Falcon

endpoint class-attribute instance-attribute

endpoint: Optional[ModelEndpoint] = None

The endpoint of the model when hosted remotely.

extra_kwargs class-attribute instance-attribute

extra_kwargs: Optional[dict[str, Any]] = None

Additional keyword arguments for the model library.

The accepted keys and values depend on the model library used.

file class-attribute instance-attribute

file: Optional[str] = None

The path to the model file. For local models only.

gpu_layers class-attribute instance-attribute

gpu_layers: int = 0

The number of layers to run on the GPU.

The actual number is only used llama.cpp. The other model libraries will determine whether to run on the GPU just by checking of this value is larger than 0.

last_n_tokens class-attribute instance-attribute

last_n_tokens: int = 64

The number of tokens to consider for the repetition penalty.

loras class-attribute instance-attribute

loras: set[Path] = set()

The list of additional LoRAs files to load.

max_new_tokens class-attribute instance-attribute

max_new_tokens: int = 256

The maximum number of new tokens to generate.

name class-attribute instance-attribute

name: str = 'Qwen/Qwen2.5-0.5B-Instruct'

The name of the model. Can be a path to a local model or a huggingface handle.

prompt_config class-attribute instance-attribute

prompt_config: PromptConfig = PromptConfig()

The prompt configuration.

This is used to process the input passed to the services.

prompt_setup class-attribute instance-attribute

prompt_setup: List[Tuple[Literal['system', 'user', 'assistant'], PromptConfig]] = []

A list of predefined prompts for the model.

When a model context is inialized or reset, this will be used to set up the context.

repetition_penalty class-attribute instance-attribute

repetition_penalty: float = 1.1

The repetition penalty.

seed class-attribute instance-attribute

seed: int = -1

The seed for the random number generator.

stop class-attribute instance-attribute

stop: Optional[List[str]] = None

The stop tokens for the generation.

stream class-attribute instance-attribute

stream: bool = False

Whether to stream the output.

temperature class-attribute instance-attribute

temperature: float = 0.8

The temperature for the sampling.

threads class-attribute instance-attribute

threads: Optional[int] = None

The number of threads to use for the generation.

top_k class-attribute instance-attribute

top_k: int = 40

The number of tokens to consider for the top-k sampling.

top_p class-attribute instance-attribute

top_p: float = 0.95

The cumulative probability for the top-p sampling.

torch_dtype class-attribute instance-attribute

torch_dtype: Optional[str] = None

The torch data type for the model.

ModelEndpoint

Bases: BaseModel

Configuration for model endpoints where models are hosted remotely.

PromptConfig

Bases: BaseModel

Configuration for prompts.

Methods:

Name Description
format

Format the prompt with the given parameters.

model_post_init

Post-initialization hook for pyandic.

Attributes:

Name Type Description
is_template bool

Whether the config uses jinja2 templates.

parameters list[str]

Retrieves and returns the parameters of the prompt.

path Optional[str]

The path to an external prompt file.

template Optional[Template]

Optional template of the prompt. This should follow the Jinja2 syntax.

text str

The text of the prompt. Can contain placeholders.

is_template class-attribute instance-attribute

is_template: bool = False

Whether the config uses jinja2 templates.

parameters property

parameters: list[str]

Retrieves and returns the parameters of the prompt.

This happens on-the-fly and is not stored in the model.

Returns:

Type Description
list[str]

The parameters of the prompt.

path class-attribute instance-attribute

path: Optional[str] = None

The path to an external prompt file.

This will overload the values of text and/or template.

template class-attribute instance-attribute

template: Optional[Template] = Field(exclude=True, default=None)

Optional template of the prompt. This should follow the Jinja2 syntax.

text class-attribute instance-attribute

text: str = '{message}'

The text of the prompt. Can contain placeholders.

format

format(parameters: Mapping[str, str | list[ChatMessage]] | None = None) -> str

Format the prompt with the given parameters.

Parameters:

Name Type Description Default

parameters

Mapping[str, str | list[ChatMessage]] | None

The parameters to format the prompt with.

None

Returns:

Type Description
str

The formatted prompt

Source code in gerd/models/model.py
def format(
    self, parameters: Mapping[str, str | list[ChatMessage]] | None = None
) -> str:
    """Format the prompt with the given parameters.

    Parameters:
        parameters: The parameters to format the prompt with.

    Returns:
        The formatted prompt
    """
    if parameters is None:
        parameters = {}
    return (
        self.template.render(**parameters)
        if self.template
        else (
            self.text.format(**parameters)
            if self.text
            else "".join(str(parameters.values()))
        )
    )

model_post_init

model_post_init(__context: Any) -> None

Post-initialization hook for pyandic.

When path is set, the text or template is read from the file and the template is created. Path ending with '.jinja2' will be treated as a template. If no path is set, the text parameter is used to initialize the template if is_template is set to True. Parameters: __context: The context of the model (not used)

Source code in gerd/models/model.py
def model_post_init(self, __context: Any) -> None:  # noqa: ANN401
    """Post-initialization hook for pyandic.

    When path is set, the text or template is read from the file and
    the template is created.
    Path ending with '.jinja2' will be treated as a template.
    If no path is set, the text parameter is used to initialize the template
    if is_template is set to True.
    Parameters:
        __context: The context of the model (not used)
    """
    if self.path:
        # reset self.text when path is set
        self.text = ""
        path = Path(self.path)
        if path.exists():
            with path.open("r", encoding="utf-8") as f:
                self.text = f.read()
                if self.is_template or path.suffix == ".jinja2":
                    self.is_template = True
                    loader = FileSystemLoader(path.parent)
                    env = Environment(
                        loader=loader,
                        autoescape=select_autoescape(
                            disabled_extensions=(".jinja2",),
                            default_for_string=True,
                            default=True,
                        ),
                    )
                    self.template = env.get_template(path.name)
        else:
            msg = f"'{self.path}' does not exist!"
            raise ValueError(msg)
    elif self.text and self.is_template:
        self.template = Environment(autoescape=True).from_string(self.text)