Key parameters for LLMs
I recently did a workshop about building agents. During the workshop I discussed the key parameters for LLMs, so I thought it’d be useful to write a short post about it.
These are the parameters that you will usually use when building LLM-based products:
- Model
- Messages/prompts
- Temperature
- Seed
- Top-P Sampling
- Logprobs
- Logit biases
- Max completion tokens
- Response format
- Streaming
- Tools
In the next sections, I’ll go over each of these parameters in more detail and provide some suggestions about how to use them.
Ech provider has a slightly different name for the same parameter, but the concept is the same. You’ll have to check the documentation of the provider you’re using to see the exact name of the parameter.
Model
When choosing a model, consider the following factors:
- Complexity of task: Am I solving a problem that requires reasoning capabilities? Or is it a simple task?
- Speed: How important is it that the model replies quickly? Is this something that I can run on the background?
- Cost: How much do I want to spend on this task?
- Provider: Which providers do I have access to? Do I need to self-host?
Right now, my go-to models are Gemini 2.5 Pro or Claude 4 for complex tasks. For simpler tasks, I use Gemini 2.5 Flash or OpenAI’s gpt-4.1 family.
The best way to pick a model is to start with the most capable models and then scale down to the simplest models that still capable of solving the task. Otherwise, you could end up spending a lot of time trying to solve an issue that you simply cannot solve reliably with smaller models.
Messages/prompts
The messages/prompts you send to the LLM will determine the context and instructions for the LLM to follow. I wrote a guide on prompt engineering that covers the basics of how to write good prompts.
Temperature
The temperature parameter controls the randomness of the model’s output. A temperature of 0 will make the model more deterministic, while a temperature of 1 will make the model more random.
I wrote a post about how temperature affects the output of LLMs. For tasks where consistency is important, use a temperature of 0. For tasks where creativity is important, use a temperature above 0. I’ve found that anything above 1.3-4 is too random.
Seed
The seed parameter is used to initialize the random number generator that is then used to sample the next token. If you want to maximize reproducibility, set a seed value.
This is only available in OpenAI, Gemini, and open-weight models. Check my post for more details.
Top-P Sampling
Top-p sampling is a technique that limits the number of tokens that can be selected from the vocabulary by first selecting the smallest group of tokens whose combined probability ≥ P. For example, top P = 0.9 picks the next token from the smallest group of tokens that together cover at least 90% probability.
I rarely use this parameter.
Logit biases
Logits are the raw scores that the model assigns to each token. You can use biases to change the odds of a token being selected. Positive biases increase the odds of the token being selected, while negative biases do the opposite.
This is often used for document classification tasks or LLM-based rerankers.
Logprobs
Logprobs are the logaritmic probabilities of the tokens. They are defined as:
\[logprob(w_i) = ln(P(w_i)) = ln(\frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}) \]
Where:
- \(w_i\) is the \(i\)-th token in the vocabulary.
- \(P(w_i)\) is the probability of the \(i\)-th token.
- \(z_i\) is the logit of the \(i\)-th token.
- \(n\) is the number of tokens in the vocabulary.
You can use this parameter in OpenAI models. Gemini provides a single request with logprobs per day (yes, I’m not kidding 😅). Anthropic doesn’t provide them.
Open-weight models don’t provide logprobs. They provide logits instead, that you can use to calculate the probabilities of the tokens.
Max completion tokens
This parameter limits the number of tokens that the model can generate. This is useful to control costs and length of the output.
Response format
You can use this parameter to specify the format of the response. Anthropic, OpenAI, and Gemini support structured outputs in the form of JSON schemas. I’ve written multiple posts on this topic:
- Structured outputs can hurt the performance of LLMs
- The good, the bad, and the ugly of Gemini’s structured outputs
- Structured outputs: don’t put the cart before the horse
Open-weight models allow for more flexible structured output formats. For example, using outlines lets you define custom regEx patterns to extract the data you need.
Streaming
This parameter is used to stream the response from the model. This improves the user experience as it allows you to see the output as it’s being generated.
Tools
Tools are a way to extend the capabilities of the model by providing it with external tools. This is critical for building agents.
Conclusion
This was a very short post about the key parameters for LLMs. I hope you found it useful.
Let me know if you have any questions or feedback.
Citation
@online{castillo2025,
author = {Castillo, Dylan},
title = {Key Parameters for {LLMs}},
date = {2025-06-29},
url = {https://dylancastillo.co/posts/key-parameters-llms.html},
langid = {en}
}