Skip to content

Generation Configuration

Generation Configuration

Configure the behavior of the LLM response generation using the generation_config parameter. These settings control randomness, output length, and sampling strategies.

Parameters

Parameter Type Required Description Default Range
max_new_tokens Integer No Maximum number of tokens the model will generate, excluding input tokens 1024 -
do_sample Boolean No Enables sampling mode. If False, uses greedy decoding (most likely token). If True, samples from probability distribution for diverse outputs False True/False
temperature Float No Controls randomness: lower values = more deterministic, higher values = more creative. Only applies when do_sample=True 0.3 0.1-2.0
top_p Float No Nucleus sampling threshold. Model selects from smallest set of tokens whose cumulative probability exceeds this value 0.1 0.1-1.0
top_k Integer No Limits sampling to the k most likely tokens at each step 20 1-100

Sampling Strategy

  • top_k: Fixed number of top candidate words
  • top_p: Dynamically selects words until cumulative probability reaches threshold

Previous: API Reference