Evaluator-optimizer workflow with Pydantic AI

til
llm
pydantic-ai
workflows
Author
Affiliation
Published

July 9, 2025

Modified

July 11, 2025

I’m doing a deep dive into Pydantic AI, so I’ve been re-implementing typical patterns for building agentic systems.

In this post, I’ll explore how to build a evaluator-optimizer workflow. I won’t cover the basics of agentic workflows, so if you’re not familiar with the concept, I recommend you to read this post first.

What is evaluator-optimizer?

Evaluator-optimizer is a pattern that has an LLM generator and an LLM evaluator. The generator generates a solution and the evaluator evaluates if the solution is good enough. If it’s not, the generator is given feedback and it generates a new solution. This process is repeated until the solution is good enough.

It looks like this:

flowchart LR
    In([In]) --> Gen["Generator (LLM)"]
    Gen -- "Solution" --> Eval["Evaluator (LLM)"]
    Eval -- "Accepted" --> Out([Out])
    Eval -- "Rejected + Feedback" --> Gen

Examples:

  • Content generation that must match certain guidelines such as writing with a particular style.
  • Improving search results iteratively

Let’s see how this looks in practice.

Setup

I will implement a simple workflow:

  1. Generate a candidate article
  2. Evaluate if the article is good enough
  3. If it’s not, provide feedback and generate a new article
  4. Repeat until the article is good enough

Before we start, because Pydantic AI uses asyncio under the hood, you need to enable nest_asyncio to use it in a notebook:

import nest_asyncio

nest_asyncio.apply()

Then, you need to import the required libraries. I’m using Logfire to monitor the workflow.

import os
from typing import Literal

import logfire
import requests
from dotenv import load_dotenv
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel, Field

load_dotenv()
True

PydanticAI is compatible with OpenTelemetry (OTel). So it’s pretty easy to use it with Logfire or with any other OTel-compatible observability tool (e.g., Langfuse).

To enable tracking, create a project in Logfire, generate a Write token and add it to the .env file. Then, you just need to run:

logfire.configure(
    token=os.getenv('LOGFIRE_TOKEN'),
)
logfire.instrument_pydantic_ai()

The first time you run this, it will ask you to create a project in Logfire. From it, it will generate a logfire_credentials.json file in your working directory. In following runs, it will automatically use the credentials from the file.

Evaluator-optimizer workflow

The workflow is composed of two steps:

  • Text generator: Generates a candidate article.
  • Evaluator: Evaluates if the article is good enough.

I’ll split the text generation into two agents: generator and fixer. The generator will generate a candidate article and the fixer will fix the article, when provided with feedback.

generator = Agent(
    'openai:gpt-4.1-mini',
    system_prompt=(
        "You are an expert writer. Provided with a topic, you will generate an engaging article with less than 500 words"
    ),
)

fixer = Agent(
    'openai:gpt-4.1-mini',
    system_prompt=(
        "You are an expert writer. Provided with a text and feedback, you wil improve the text."
    ),
)

Next, I’ll create the Evaluator agent. It will take a text and it will evaluate if it’s good enough. It’ll produce an Evaluation object as the output.

class Evaluation(BaseModel):
    explanation: str = Field(
        description="Explain why the text evaluated matches or not the evaluation criteria"
    )
    feedback: str = Field(
        description="Provide feedback to the writer to improve the text"
    )
    is_correct: bool = Field(
        description="Whether the text evaluated matches or not the evaluation criteria"
    )

evaluator = Agent(
    'openai:gpt-4.1-mini',
    system_prompt=(
        "You are an expert evaluator. Provided with a text, you will evaluate if it's written in British English and if it's appropriate for a young audience. The text must always use British spelling and grammar. Make sure the text doesn't include any em dashes."
    ),
    output_type=Evaluation,
)

Finally, you can encapsulate all the logic in a single function:

@logfire.instrument("Run workflow")
def run_workflow(topic: str) -> str:
    text = generator.run_sync(f"Generate an article about '{topic}'")
    evaluation = evaluator.run_sync(f"Evaluate the following text: {text.output}")
    for _ in range(3):
        if not evaluation.output.is_correct:
            text = fixer.run_sync(f"Fix the text: {text.output} with the following feedback: {evaluation.output.feedback}")
            evaluation = evaluator.run_sync(f"Evaluate the following text: {text.output}")
        else:
            return text.output
    return text.output

output = run_workflow("Consumption of hard drugs")
11:28:25.995 Run workflow
11:28:25.995   generator run
11:28:25.996     chat gpt-4.1-mini
11:28:36.293   evaluator run
11:28:36.294     chat gpt-4.1-mini

And here’s the output:

print(output)
**The Complex Reality of Hard Drug Consumption**

Hard drugs — substances such as heroin, cocaine, methamphetamines, and crack — have long been a subject of concern worldwide due to their profound impact on individuals and society. The consumption of these drugs is not merely a matter of personal choice but a complex issue influenced by social, economic, psychological, and cultural factors.

**Understanding Hard Drugs and Their Effects**

Hard drugs are characterized by their high potential for addiction and severe physical and psychological effects. Unlike softer substances such as marijuana or alcohol (when consumed responsibly), hard drugs often disrupt brain function dramatically, leading to addiction, mental health disorders, and significant physical health problems. Users may experience paranoia, hallucinations, heart issues, and even fatal overdoses.

The allure of hard drugs often stems from their ability to produce intense euphoria or numb emotional pain temporarily. However, this fleeting escape comes at a steep cost. Dependence quickly sets in, making cessation incredibly difficult and often trapping users in a cycle of abuse.

**Social and Economic Implications**

The ramifications of hard drug consumption ripple beyond the individual. Families endure emotional and financial strain, communities face increased crime rates and reduced public safety, and healthcare systems are burdened with treating overdoses and long-term complications. Moreover, productivity declines as addiction interferes with employment, contributing to broader economic challenges.

Many users come from marginalized backgrounds, where poverty, trauma, and lack of education or opportunity make drugs seem like a refuge or an escape. This correlation highlights that addressing drug consumption isn't only a matter of law enforcement but of social equity and support.

**Challenges in Addressing Hard Drug Use**

Efforts to reduce hard drug consumption have varied widely, from strict punitive measures to harm reduction strategies. While criminalization seeks to deter use, it often leads to overcrowded prisons and can exacerbate social stigma, making it harder for users to seek help. Conversely, approaches like supervised consumption sites, needle exchange programs, and accessible addiction treatment aim to minimize harm and promote recovery.

Prevention and education are critical components. Informing communities about the risks of hard drugs and providing mental health support can reduce initial experimentation and help those at risk before addiction takes hold.

**Moving Towards Compassionate Solutions**

Ultimately, the consumption of hard drugs is a multifaceted issue requiring balanced and compassionate responses. Policymakers, healthcare providers, and communities must work together to create environments that prioritize treatment over punishment, recognize addiction as a health issue, and promote social support.

By understanding the complex realities behind hard drug use, society can better address its consequences and help those affected find a path to recovery and hope.

That’s all!

You can access this notebook here.

If you have any questions or feedback, please let me know in the comments below.

Citation

BibTeX citation:
@online{castillo2025,
  author = {Castillo, Dylan},
  title = {Evaluator-Optimizer Workflow with {Pydantic} {AI}},
  date = {2025-07-09},
  url = {https://dylancastillo.co/til/evaluator-optimizer-pydantic-ai.html},
  langid = {en}
}
For attribution, please cite this work as:
Castillo, Dylan. 2025. “Evaluator-Optimizer Workflow with Pydantic AI.” July 9, 2025. https://dylancastillo.co/til/evaluator-optimizer-pydantic-ai.html.