Generation Configuration

Configure the behavior of the LLM response generation using the generation_config parameter. These settings control randomness, output length, and sampling strategies.

Parameters

Parameter	Type	Required	Description	Default	Range
max_new_tokens	Integer	No	Maximum number of tokens the model will generate, excluding input tokens	1024	-
do_sample	Boolean	No	Enables sampling mode. If False, uses greedy decoding (most likely token). If True, samples from probability distribution for diverse outputs	False	True/False
temperature	Float	No	Controls randomness: lower values = more deterministic, higher values = more creative. Only applies when `do_sample=True`	0.3	0.1-2.0
top_p	Float	No	Nucleus sampling threshold. Model selects from smallest set of tokens whose cumulative probability exceeds this value	0.1	0.1-1.0
top_k	Integer	No	Limits sampling to the k most likely tokens at each step	20	1-100

Sampling Strategy

top_k: Fixed number of top candidate words
top_p: Dynamically selects words until cumulative probability reaches threshold

← Previous: API Reference

Next: Sampling Methods →