Generation Configuration

LLM behavioral control parameters

Default settings for generation_config, which control the LLM’s behavior when generating responses.

Parameter	Type	Required	Description	Default	Range
max_new_tokens	Integer	No	The maximum number of tokens (words or word pieces) the model will add to its output, not counting the tokens from user input prompt.	1024
do_sample	Boolean	No	If False (default), the model will always select the most likely next word (greedy decoding), producing more consistent and predictable results. If True, the model will choose words randomly based on probabilities (sampling), leading to more diverse outputs.	False	True/False
temperature	Float	No	Controls randomness; lower = more deterministic, higher = more creative. Valid when do_sample=true	0.3	0.1-2.0
top_p	Float	No	Decides how many possible next words the model can choose from when writing text. Instead of looking at all words, the model keeps only the most likely ones until their combined chance adds up to top_p - Big top_p (close to 1.0) → the model can pick from many words. - Small top_p → the model only picks from the top few most likely words.	0.1	0.1-1.0
top_k	Integer	No	Decides how many of the most likely words the model can choose from when writing text. Instead of looking at the entire vocabulary, it only keeps the top k most probable words and ignores the rest.		20

Tip: