Generation Configuration
LLM behavioral control parameters
Default settings for generation_config, which control the LLM’s behavior when generating responses.
| Parameter | Type | Required | Description | Default | Range |
|---|---|---|---|---|---|
| max_new_tokens | Integer | No | The maximum number of tokens (words or word pieces) the model will add to its output, not counting the tokens from user input prompt. | 1024 | |
| do_sample | Boolean | No | If False (default), the model will always select the most likely next word (greedy decoding), producing more consistent and predictable results. If True, the model will choose words randomly based on probabilities (sampling), leading to more diverse outputs. | False | True/False |
| temperature | Float | No | Controls randomness; lower = more deterministic, higher = more creative. Valid when do_sample=true | 0.3 | 0.1-2.0 |
| top_p | Float | No | Decides how many possible next words the model can choose from when writing text. Instead of looking at all words, the model keeps only the most likely ones until their combined chance adds up to top_p - Big top_p (close to 1.0) → the model can pick from many words. - Small top_p → the model only picks from the top few most likely words. |
0.1 | 0.1-1.0 |
| top_k | Integer | No | Decides how many of the most likely words the model can choose from when writing text. Instead of looking at the entire vocabulary, it only keeps the top k most probable words and ignores the rest. | 20 |
Tip:
- top_k = fixed number of top words
- top_p = picks words until combined probability reaches a threshold