max_sequence_length limit. It’s currently supported for FineTuning and VSLFineTuning mode, and applies to both prompt/completion datasets and multi-turn datasets structured as user/assistant interactions.
Truncation supports two modes:
-
keep_start: Truncates from sequence end, retains beginning -
keep_end: Truncates from sequence start, retains end
max_sequence_length cannot be achieved from this first part, the second part (responses or assistant outputs) is truncated as well.
Finally, if the tokens available for truncation are insufficient to meet the max_sequence_length requirement, the sequence is skipped.
The
keep_start and keep_end modes apply only when truncating the first part of the sequence (i.e., prompts or user inputs). If truncation extends to the second part (responses or assistant outputs), the default mode is keep_start, i.e tokens are removed from the end.Specifying Truncation in the Config File
Thetruncate_to_msl section in the configuration file specifies the parameters for sequence truncation to ensure the total sequence length remains within the max_sequence_length (MSL).
Available options are:
-
truncation_mode: Specifies the truncation strategy.-
keep_start: Removes tokens from the end of the sequence, preserving the start. -
keep_end: Removes tokens from the start of the sequence, preserving the end.
-
-
max_turn_length: Sets the maximum allowed length for any single turn (segment) in the sequence.- Turns exceeding this limit are truncated according to the specified
truncation_mode.
- Turns exceeding this limit are truncated according to the specified
-
Truncation will prioritize removing tokens from the end (
keep_startmode). - Each turn in the sequence will be restricted to a maximum of 512 tokens after truncation.

