Key parameters for LLMs

llm
openai
anthropic
gemini
python
Author
Affiliation
Published

June 29, 2025

Modified

June 29, 2025

I recently did a workshop about building agents. During the workshop I discussed the key parameters for LLMs, so I thought it’d be useful to write a short post about it.

These are the parameters that you will usually use when building LLM-based products:

In the next sections, I’ll go over each of these parameters in more detail and provide some suggestions about how to use them.

Note

Ech provider has a slightly different name for the same parameter, but the concept is the same. You’ll have to check the documentation of the provider you’re using to see the exact name of the parameter.

Model

When choosing a model, consider the following factors:

  • Complexity of task: Am I solving a problem that requires reasoning capabilities? Or is it a simple task?
  • Speed: How important is it that the model replies quickly? Is this something that I can run on the background?
  • Cost: How much do I want to spend on this task?
  • Provider: Which providers do I have access to? Do I need to self-host?

Right now, my go-to models are Gemini 2.5 Pro or Claude 4 for complex tasks. For simpler tasks, I use Gemini 2.5 Flash or OpenAI’s gpt-4.1 family.

The best way to pick a model is to start with the most capable models and then scale down to the simplest models that still capable of solving the task. Otherwise, you could end up spending a lot of time trying to solve an issue that you simply cannot solve reliably with smaller models.

Messages/prompts

The messages/prompts you send to the LLM will determine the context and instructions for the LLM to follow. I wrote a guide on prompt engineering that covers the basics of how to write good prompts.

Temperature

The temperature parameter controls the randomness of the model’s output. A temperature of 0 will make the model more deterministic, while a temperature of 1 will make the model more random.

I wrote a post about how temperature affects the output of LLMs. For tasks where consistency is important, use a temperature of 0. For tasks where creativity is important, use a temperature above 0. I’ve found that anything above 1.3-4 is too random.

Seed

The seed parameter is used to initialize the random number generator that is then used to sample the next token. If you want to maximize reproducibility, set a seed value.

This is only available in OpenAI, Gemini, and open-weight models. Check my post for more details.

Top-P Sampling

Top-p sampling is a technique that limits the number of tokens that can be selected from the vocabulary by first selecting the smallest group of tokens whose combined probability ≥ P. For example, top P = 0.9 picks the next token from the smallest group of tokens that together cover at least 90% probability.

I rarely use this parameter.

Logit biases

Logits are the raw scores that the model assigns to each token. You can use biases to change the odds of a token being selected. Positive biases increase the odds of the token being selected, while negative biases do the opposite.

This is often used for document classification tasks or LLM-based rerankers.

Logprobs

Logprobs are the logaritmic probabilities of the tokens. They are defined as:

\[logprob(w_i) = ln(P(w_i)) = ln(\frac{e^{z_i}}{\sum_{j=1}^{n} e^{z_j}}) \]

Where:

  • \(w_i\) is the \(i\)-th token in the vocabulary.
  • \(P(w_i)\) is the probability of the \(i\)-th token.
  • \(z_i\) is the logit of the \(i\)-th token.
  • \(n\) is the number of tokens in the vocabulary.

You can use this parameter in OpenAI models. Gemini provides a single request with logprobs per day (yes, I’m not kidding 😅). Anthropic doesn’t provide them.

Open-weight models don’t provide logprobs. They provide logits instead, that you can use to calculate the probabilities of the tokens.

Max completion tokens

This parameter limits the number of tokens that the model can generate. This is useful to control costs and length of the output.

Response format

You can use this parameter to specify the format of the response. Anthropic, OpenAI, and Gemini support structured outputs in the form of JSON schemas. I’ve written multiple posts on this topic:

Open-weight models allow for more flexible structured output formats. For example, using outlines lets you define custom regEx patterns to extract the data you need.

Streaming

This parameter is used to stream the response from the model. This improves the user experience as it allows you to see the output as it’s being generated.

Tools

Tools are a way to extend the capabilities of the model by providing it with external tools. This is critical for building agents.

Conclusion

This was a very short post about the key parameters for LLMs. I hope you found it useful.

Let me know if you have any questions or feedback.

Citation

BibTeX citation:
@online{castillo2025,
  author = {Castillo, Dylan},
  title = {Key Parameters for {LLMs}},
  date = {2025-06-29},
  url = {https://dylancastillo.co/posts/key-parameters-llms.html},
  langid = {en}
}
For attribution, please cite this work as:
Castillo, Dylan. 2025. “Key Parameters for LLMs.” June 29, 2025. https://dylancastillo.co/posts/key-parameters-llms.html.