Generation Configuration
Generation Configuration
Configure the behavior of the LLM response generation using the generation_config parameter. These settings control randomness, output length, and sampling strategies.
Parameters
| Parameter | Type | Required | Description | Default | Range |
|---|---|---|---|---|---|
| max_new_tokens | Integer | No | Maximum number of tokens the model will generate, excluding input tokens | 1024 | - |
| do_sample | Boolean | No | Enables sampling mode. If False, uses greedy decoding (most likely token). If True, samples from probability distribution for diverse outputs | False | True/False |
| temperature | Float | No | Controls randomness: lower values = more deterministic, higher values = more creative. Only applies when do_sample=True |
0.3 | 0.1-2.0 |
| top_p | Float | No | Nucleus sampling threshold. Model selects from smallest set of tokens whose cumulative probability exceeds this value | 0.1 | 0.1-1.0 |
| top_k | Integer | No | Limits sampling to the k most likely tokens at each step | 20 | 1-100 |
Sampling Strategy
- top_k: Fixed number of top candidate words
- top_p: Dynamically selects words until cumulative probability reaches threshold
← Previous: API Reference
Next: Sampling Methods →