Skip to content

Generation Configuration

LLM behavioral control parameters

Default settings for generation_config, which control the LLM’s behavior when generating responses.

Parameter Type Required Description Default Range
max_new_tokens Integer No The maximum number of tokens (words or word pieces) the model will add to its output, not counting the tokens from user input prompt. 1024
do_sample Boolean No If False (default), the model will always select the most likely next word (greedy decoding), producing more consistent and predictable results. If True, the model will choose words randomly based on probabilities (sampling), leading to more diverse outputs. False True/False
temperature Float No Controls randomness; lower = more deterministic, higher = more creative. Valid when do_sample=true 0.3 0.1-2.0
top_p Float No Decides how many possible next words the model can choose from when writing text. Instead of looking at all words, the model keeps only the most likely ones until their combined chance adds up to top_p

- Big top_p (close to 1.0) → the model can pick from many words.
- Small top_p → the model only picks from the top few most likely words.
0.1 0.1-1.0
top_k Integer No Decides how many of the most likely words the model can choose from when writing text. Instead of looking at the entire vocabulary, it only keeps the top k most probable words and ignores the rest. 20

Tip:

  • top_k = fixed number of top words
  • top_p = picks words until combined probability reaches a threshold