Transformer Forecaster Configuration¶

class convokit.forecaster.TransformerForecasterConfig.TransformerForecasterConfig(output_dir: str, per_device_batch_size: int = 4, gradient_accumulation_steps: int = 1, num_train_epochs: int = 4, learning_rate: float = 0.0001, random_seed: int = 1, device: str = 'cuda', context_mode: str = 'normal')¶

Configuration class for defining training arguments used during fine-tuning of a TransformerDecoderModel or TransformerEncoderModel.

This class encapsulates all relevant hyperparameters and system settings required for training, evaluation, and reproducibility. Each field is accompanied by descriptive metadata to aid in configuration parsing and command-line interfacing (e.g., via argparse or transformers.HfArgumentParser).

Attributes:

output_dir (str): Path to the directory where outputs such as predictions, model checkpoints, and training logs will be saved.

per_device_batch_size (int): Number of samples processed per device (e.g., GPU) in a single batch. Default is 4.

gradient_accumulation_steps (int): Number of steps to accumulate gradients before performing a backward pass and optimizer update. Useful for simulating larger batch sizes. Default is 1.

num_train_epochs (int): Total number of epochs for model training. Default is 4.

learning_rate (float): Initial learning rate for the optimizer. Default is 1e-4.

random_seed (int): Seed value to ensure reproducible training behavior. Default is 1.

device (str): Device identifier on which the model will be trained and evaluated. Typically ‘cuda’, ‘cuda:0’, or ‘cpu’. Default is “cuda”.

context_mode (str): Specifies how the input context is constructed: “normal”: Use full conversational context (previous utterances). “no-context”: Use only the current utterance. Default is “normal”.