gerd.models.model

Model configuration for supported model classes.

Classes:

Name	Description
`ChatMessage`	Data structure for chat messages.
`ModelConfig`	Configuration for large language models.
`ModelEndpoint`	Configuration for model endpoints where models are hosted remotely.
`PromptConfig`	Configuration for prompts.

Attributes:

Name	Type	Description
`ChatRole`		Currently supported chat roles.
`EndpointType`		Endpoint for remote llm services.

ChatRole `module-attribute`

ChatRole = Literal['system', 'user', 'assistant']

Currently supported chat roles.

EndpointType `module-attribute`

EndpointType = Literal['llama.cpp', 'openai']

Endpoint for remote llm services.

ChatMessage

Bases: TypedDict

Data structure for chat messages.

Attributes:

Name	Type	Description
`content`	`str`	The content of the chat message.
`role`	`ChatRole`	The role or source of the chat message.

content `instance-attribute`

content: str

The content of the chat message.

role `instance-attribute`

role: ChatRole

The role or source of the chat message.

ModelConfig

Bases: BaseModel

Configuration for large language models.

Most llm libraries and/or services share common parameters for configuration. Explaining each parameter is out of scope for this documentation. The most essential parameters are explained for instance here. Default values have been chosen according to ctransformers library.

Attributes:

Name	Type	Description
`batch_size`	`int`	The batch size for the generation.
`context_length`	`int`	The context length for the model. Currently only LLaMA, MPT and Falcon
`endpoint`	`Optional[ModelEndpoint]`	The endpoint of the model when hosted remotely.
`extra_kwargs`	`Optional[dict[str, Any]]`	Additional keyword arguments for the model library.
`file`	`Optional[str]`	The path to the model file. For local models only.
`gpu_layers`	`int`	The number of layers to run on the GPU.
`last_n_tokens`	`int`	The number of tokens to consider for the repetition penalty.
`loras`	`set[Path]`	The list of additional LoRAs files to load.
`max_new_tokens`	`int`	The maximum number of new tokens to generate.
`name`	`str`	The name of the model. Can be a path to a local model or a huggingface handle.
`prompt_config`	`PromptConfig`	The prompt configuration.
`prompt_setup`	`List[Tuple[Literal['system', 'user', 'assistant'], PromptConfig]]`	A list of predefined prompts for the model.
`repetition_penalty`	`float`	The repetition penalty.
`seed`	`int`	The seed for the random number generator.
`stop`	`Optional[List[str]]`	The stop tokens for the generation.
`stream`	`bool`	Whether to stream the output.
`temperature`	`float`	The temperature for the sampling.
`threads`	`Optional[int]`	The number of threads to use for the generation.
`top_k`	`int`	The number of tokens to consider for the top-k sampling.
`top_p`	`float`	The cumulative probability for the top-p sampling.
`torch_dtype`	`Optional[str]`	The torch data type for the model.

batch_size `class-attribute` `instance-attribute`

batch_size: int = 8

The batch size for the generation.

context_length `class-attribute` `instance-attribute`

context_length: int = 0

The context length for the model. Currently only LLaMA, MPT and Falcon

endpoint `class-attribute` `instance-attribute`

endpoint: Optional[ModelEndpoint] = None

The endpoint of the model when hosted remotely.

extra_kwargs `class-attribute` `instance-attribute`

extra_kwargs: Optional[dict[str, Any]] = None

Additional keyword arguments for the model library.

The accepted keys and values depend on the model library used.

file `class-attribute` `instance-attribute`

file: Optional[str] = None

The path to the model file. For local models only.

gpu_layers `class-attribute` `instance-attribute`

gpu_layers: int = 0

The number of layers to run on the GPU.

The actual number is only used llama.cpp. The other model libraries will determine whether to run on the GPU just by checking of this value is larger than 0.

last_n_tokens `class-attribute` `instance-attribute`

last_n_tokens: int = 64

The number of tokens to consider for the repetition penalty.

loras `class-attribute` `instance-attribute`

loras: set[Path] = set()

The list of additional LoRAs files to load.

max_new_tokens `class-attribute` `instance-attribute`

max_new_tokens: int = 256

The maximum number of new tokens to generate.

name `class-attribute` `instance-attribute`

name: str = 'Qwen/Qwen2.5-0.5B-Instruct'

The name of the model. Can be a path to a local model or a huggingface handle.

prompt_config `class-attribute` `instance-attribute`

prompt_config: PromptConfig = PromptConfig()

The prompt configuration.

This is used to process the input passed to the services.

prompt_setup `class-attribute` `instance-attribute`

prompt_setup: List[Tuple[Literal['system', 'user', 'assistant'], PromptConfig]] = []

A list of predefined prompts for the model.

When a model context is inialized or reset, this will be used to set up the context.

repetition_penalty `class-attribute` `instance-attribute`

repetition_penalty: float = 1.1

The repetition penalty.

seed `class-attribute` `instance-attribute`

seed: int = -1

The seed for the random number generator.

stop `class-attribute` `instance-attribute`

stop: Optional[List[str]] = None

The stop tokens for the generation.

stream `class-attribute` `instance-attribute`

stream: bool = False

Whether to stream the output.

temperature `class-attribute` `instance-attribute`

temperature: float = 0.8

The temperature for the sampling.

threads `class-attribute` `instance-attribute`

threads: Optional[int] = None

The number of threads to use for the generation.

top_k `class-attribute` `instance-attribute`

top_k: int = 40

The number of tokens to consider for the top-k sampling.

top_p `class-attribute` `instance-attribute`

top_p: float = 0.95

The cumulative probability for the top-p sampling.

torch_dtype `class-attribute` `instance-attribute`

torch_dtype: Optional[str] = None

The torch data type for the model.

ModelEndpoint

Bases: BaseModel

Configuration for model endpoints where models are hosted remotely.

PromptConfig

Bases: BaseModel

Configuration for prompts.

Methods:

Name	Description
`format`	Format the prompt with the given parameters.
`model_post_init`	Post-initialization hook for pyandic.

Attributes:

Name	Type	Description
`is_template`	`bool`	Whether the config uses jinja2 templates.
`parameters`	`list[str]`	Retrieves and returns the parameters of the prompt.
`path`	`Optional[str]`	The path to an external prompt file.
`template`	`Optional[Template]`	Optional template of the prompt. This should follow the Jinja2 syntax.
`text`	`str`	The text of the prompt. Can contain placeholders.

is_template `class-attribute` `instance-attribute`

is_template: bool = False

Whether the config uses jinja2 templates.

parameters `property`

parameters: list[str]

Retrieves and returns the parameters of the prompt.

This happens on-the-fly and is not stored in the model.

Returns:

Type	Description
`list[str]`	The parameters of the prompt.

path `class-attribute` `instance-attribute`

path: Optional[str] = None

The path to an external prompt file.

This will overload the values of text and/or template.

template `class-attribute` `instance-attribute`

template: Optional[Template] = Field(exclude=True, default=None)

Optional template of the prompt. This should follow the Jinja2 syntax.

text `class-attribute` `instance-attribute`

text: str = '{message}'

The text of the prompt. Can contain placeholders.

format

format(parameters: Mapping[str, str | list[ChatMessage]] | None = None) -> str

Format the prompt with the given parameters.

Parameters:

Name	Type	Description	Default
`parameters`	`Mapping[str, str \| list[ChatMessage]] \| None`	The parameters to format the prompt with.	`None`

Returns:

Type	Description
`str`	The formatted prompt

Source code in gerd/models/model.py

def format(
    self, parameters: Mapping[str, str | list[ChatMessage]] | None = None
) -> str:
    """Format the prompt with the given parameters.

    Parameters:
        parameters: The parameters to format the prompt with.

    Returns:
        The formatted prompt
    """
    if parameters is None:
        parameters = {}
    return (
        self.template.render(**parameters)
        if self.template
        else (
            self.text.format(**parameters)
            if self.text
            else "".join(str(parameters.values()))
        )
    )

model_post_init

model_post_init(__context: Any) -> None

Post-initialization hook for pyandic.

When path is set, the text or template is read from the file and the template is created. Path ending with '.jinja2' will be treated as a template. If no path is set, the text parameter is used to initialize the template if is_template is set to True. Parameters: __context: The context of the model (not used)

Source code in gerd/models/model.py

def model_post_init(self, __context: Any) -> None:  # noqa: ANN401
    """Post-initialization hook for pyandic.

    When path is set, the text or template is read from the file and
    the template is created.
    Path ending with '.jinja2' will be treated as a template.
    If no path is set, the text parameter is used to initialize the template
    if is_template is set to True.
    Parameters:
        __context: The context of the model (not used)
    """
    if self.path:
        # reset self.text when path is set
        self.text = ""
        path = Path(self.path)
        if path.exists():
            with path.open("r", encoding="utf-8") as f:
                self.text = f.read()
                if self.is_template or path.suffix == ".jinja2":
                    self.is_template = True
                    loader = FileSystemLoader(path.parent)
                    env = Environment(
                        loader=loader,
                        autoescape=select_autoescape(
                            disabled_extensions=(".jinja2",),
                            default_for_string=True,
                            default=True,
                        ),
                    )
                    self.template = env.get_template(path.name)
        else:
            msg = f"'{self.path}' does not exist!"
            raise ValueError(msg)
    elif self.text and self.is_template:
        self.template = Environment(autoescape=True).from_string(self.text)

gerd.models.model

ChatRole module-attribute

EndpointType module-attribute

ChatMessage

content instance-attribute

role instance-attribute

ModelConfig

batch_size class-attribute instance-attribute

context_length class-attribute instance-attribute

endpoint class-attribute instance-attribute

extra_kwargs class-attribute instance-attribute

file class-attribute instance-attribute

gpu_layers class-attribute instance-attribute

last_n_tokens class-attribute instance-attribute

loras class-attribute instance-attribute

max_new_tokens class-attribute instance-attribute

name class-attribute instance-attribute

prompt_config class-attribute instance-attribute

prompt_setup class-attribute instance-attribute

repetition_penalty class-attribute instance-attribute

seed class-attribute instance-attribute

stop class-attribute instance-attribute

stream class-attribute instance-attribute

temperature class-attribute instance-attribute

threads class-attribute instance-attribute

top_k class-attribute instance-attribute

top_p class-attribute instance-attribute

torch_dtype class-attribute instance-attribute

ModelEndpoint

PromptConfig

is_template class-attribute instance-attribute

parameters property

path class-attribute instance-attribute

template class-attribute instance-attribute

text class-attribute instance-attribute

format

parameters

model_post_init

ChatRole `module-attribute`

EndpointType `module-attribute`

content `instance-attribute`

role `instance-attribute`

batch_size `class-attribute` `instance-attribute`

context_length `class-attribute` `instance-attribute`

endpoint `class-attribute` `instance-attribute`

extra_kwargs `class-attribute` `instance-attribute`

file `class-attribute` `instance-attribute`

gpu_layers `class-attribute` `instance-attribute`

last_n_tokens `class-attribute` `instance-attribute`

loras `class-attribute` `instance-attribute`

max_new_tokens `class-attribute` `instance-attribute`

name `class-attribute` `instance-attribute`

prompt_config `class-attribute` `instance-attribute`

prompt_setup `class-attribute` `instance-attribute`

repetition_penalty `class-attribute` `instance-attribute`

seed `class-attribute` `instance-attribute`

stop `class-attribute` `instance-attribute`

stream `class-attribute` `instance-attribute`

temperature `class-attribute` `instance-attribute`

threads `class-attribute` `instance-attribute`

top_k `class-attribute` `instance-attribute`

top_p `class-attribute` `instance-attribute`

torch_dtype `class-attribute` `instance-attribute`

is_template `class-attribute` `instance-attribute`

parameters `property`

path `class-attribute` `instance-attribute`

template `class-attribute` `instance-attribute`

text `class-attribute` `instance-attribute`

`parameters`