What We're Building
An LLM by itself can only generate text. An LLM with tools can take actions.
Today we'll build a research agent that can:
- Search the web for information
- Read web pages
- Synthesize findings into a report
By the end, you'll understand the core loop behind every agent framework.
The Agent Loop
Every agent follows the same pattern:
while not done:
1. Observe: What's the current state?
2. Think: What should I do next?
3. Act: Use a tool to take action
4. Reflect: Did that help? Should I continue?
Let's build this step by step.
Step 1: Define Your Tools
Tools are functions the LLM can call. Each tool needs:
- A name
- A description (for the LLM)
- Input parameters
- The actual implementation
from typing import Callable
class Tool:
def __init__(self, name: str, description: str, function: Callable):
self.name = name
self.description = description
self.function = function
def run(self, **kwargs):
return self.function(**kwargs)
def to_schema(self):
"""Format for the LLM to understand."""
return {
"name": self.name,
"description": self.description,
}
Now let's create our research tools:
import requests
def web_search(query: str) -> str:
"""Search the web and return results."""
# Using a search API (you'd use Google, Bing, or similar)
response = requests.get(
"https://api.search.com/search",
params={"q": query},
headers={"Authorization": f"Bearer {API_KEY}"}
)
results = response.json()
# Format results for the LLM
formatted = []
for r in results[:5]:
formatted.append(f"Title: {r['title']}\nURL: {r['url']}\nSnippet: {r['snippet']}")
return "\n\n".join(formatted)
def read_webpage(url: str) -> str:
"""Fetch and extract text from a webpage."""
response = requests.get(url, timeout=10)
# In production, use a proper HTML parser
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Extract main content (simplified)
text = soup.get_text(separator='\n', strip=True)
# Truncate to avoid token limits
return text[:5000]
def write_report(content: str, filename: str) -> str:
"""Save a report to a file."""
with open(filename, 'w') as f:
f.write(content)
return f"Report saved to {filename}"
# Create tool instances
tools = [
Tool("web_search", "Search the web for information on a topic", web_search),
Tool("read_webpage", "Read the full content of a webpage URL", read_webpage),
Tool("write_report", "Save the final research report to a file", write_report),
]
Step 2: Create the Agent Prompt
The system prompt teaches the LLM how to use tools:
def build_system_prompt(tools: list[Tool]) -> str:
tool_descriptions = "\n".join([
f"- {t.name}: {t.description}"
for t in tools
])
return f"""You are a research agent. Your job is to research topics and produce reports.
You have access to these tools:
{tool_descriptions}
To use a tool, respond in this exact format:
THOUGHT: [Your reasoning about what to do next]
ACTION: [tool_name]
INPUT: [tool input as JSON]
When you have enough information to complete the task, respond:
THOUGHT: [Your reasoning]
FINAL_ANSWER: [Your complete response]
Research systematically:
1. Start with a broad search
2. Read 2-3 promising sources
3. Synthesize into a report
4. Save the report
Be thorough but efficient. Don't search endlessly."""
Step 3: Parse LLM Responses
We need to extract the action from the LLM's response:
import json
import re
class AgentResponse:
def __init__(self, thought: str, action: str = None, input: dict = None, final_answer: str = None):
self.thought = thought
self.action = action
self.input = input
self.final_answer = final_answer
self.is_final = final_answer is not None
def parse_response(text: str) -> AgentResponse:
"""Parse the LLM's response to extract thought, action, and input."""
thought_match = re.search(r'THOUGHT:\s*(.+?)(?=ACTION:|FINAL_ANSWER:|$)', text, re.DOTALL)
thought = thought_match.group(1).strip() if thought_match else ""
# Check for final answer
final_match = re.search(r'FINAL_ANSWER:\s*(.+)', text, re.DOTALL)
if final_match:
return AgentResponse(thought=thought, final_answer=final_match.group(1).strip())
# Extract action and input
action_match = re.search(r'ACTION:\s*(\w+)', text)
input_match = re.search(r'INPUT:\s*(\{.+?\})', text, re.DOTALL)
action = action_match.group(1) if action_match else None
input_data = json.loads(input_match.group(1)) if input_match else {}
return AgentResponse(thought=thought, action=action, input=input_data)
Step 4: The Agent Loop
Now we connect everything:
from openai import OpenAI
class ResearchAgent:
def __init__(self, tools: list[Tool], max_steps: int = 10):
self.tools = {t.name: t for t in tools}
self.max_steps = max_steps
self.client = OpenAI()
self.conversation = []
def run(self, task: str) -> str:
"""Execute the agent loop until completion."""
# Initialize conversation
system_prompt = build_system_prompt(list(self.tools.values()))
self.conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Research task: {task}"}
]
for step in range(self.max_steps):
print(f"\n--- Step {step + 1} ---")
# Get LLM response
response = self.client.chat.completions.create(
model="gpt-4o",
messages=self.conversation,
temperature=0.7,
)
llm_output = response.choices[0].message.content
print(f"LLM: {llm_output[:200]}...")
# Parse response
parsed = parse_response(llm_output)
print(f"Thought: {parsed.thought[:100]}...")
# Check if done
if parsed.is_final:
print("Agent complete!")
return parsed.final_answer
# Execute tool
if parsed.action and parsed.action in self.tools:
print(f"Executing: {parsed.action}({parsed.input})")
try:
result = self.tools[parsed.action].run(**parsed.input)
print(f"Result: {result[:200]}...")
except Exception as e:
result = f"Error: {str(e)}"
# Add to conversation
self.conversation.append({"role": "assistant", "content": llm_output})
self.conversation.append({"role": "user", "content": f"Tool result:\n{result}"})
else:
print(f"Unknown action: {parsed.action}")
break
return "Max steps reached without completion."
# Usage
agent = ResearchAgent(tools)
result = agent.run("Research the latest developments in AI video generation and produce a summary report")
print(result)
Step 5: Add Error Handling
Real agents need to handle failures:
class RobustResearchAgent(ResearchAgent):
def run(self, task: str) -> str:
# ... same setup ...
for step in range(self.max_steps):
try:
response = self._get_llm_response()
parsed = parse_response(response)
if parsed.is_final:
return parsed.final_answer
# Execute with retry
result = self._execute_with_retry(parsed.action, parsed.input)
self._add_to_conversation(response, result)
except Exception as e:
# Let the agent know something went wrong
self.conversation.append({
"role": "user",
"content": f"Error occurred: {str(e)}. Please try a different approach."
})
continue
# Graceful degradation
return self._generate_partial_answer()
def _execute_with_retry(self, action: str, input: dict, retries: int = 2) -> str:
for attempt in range(retries):
try:
return self.tools[action].run(**input)
except Exception as e:
if attempt == retries - 1:
raise
time.sleep(1) # Brief pause before retry
def _generate_partial_answer(self) -> str:
"""Generate best answer possible with information gathered so far."""
self.conversation.append({
"role": "user",
"content": "Time's up. Provide your best answer with the information you've gathered so far."
})
response = self._get_llm_response()
return response
Step 6: Observability
Log everything for debugging:
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("agent")
class ObservableAgent(ResearchAgent):
def run(self, task: str) -> str:
run_id = datetime.now().isoformat()
logger.info(f"[{run_id}] Starting research task: {task}")
for step in range(self.max_steps):
logger.info(f"[{run_id}] Step {step + 1}")
# Log LLM call
start = time.time()
response = self._get_llm_response()
logger.info(f"[{run_id}] LLM response time: {time.time() - start:.2f}s")
parsed = parse_response(response)
logger.info(f"[{run_id}] Thought: {parsed.thought[:100]}")
logger.info(f"[{run_id}] Action: {parsed.action}")
if parsed.is_final:
logger.info(f"[{run_id}] Completed in {step + 1} steps")
return parsed.final_answer
# Log tool execution
start = time.time()
result = self._execute_tool(parsed.action, parsed.input)
logger.info(f"[{run_id}] Tool {parsed.action} took {time.time() - start:.2f}s")
logger.info(f"[{run_id}] Result preview: {result[:100]}")
Running the Agent
# Create and run
agent = ObservableAgent(tools)
report = agent.run(
"Research how companies are using AI agents in production in 2024. "
"Focus on practical applications, not research papers. "
"Save your findings to 'ai-agents-2024.md'."
)
print("Final report:")
print(report)
Example output:
--- Step 1 ---
Thought: I'll start with a broad search about AI agents in production...
Action: web_search
Result: [5 relevant articles listed]
--- Step 2 ---
Thought: The TechCrunch article looks promising. Let me read it...
Action: read_webpage
Result: [Article content]
--- Step 3 ---
Thought: Good info. Let me check the other source about enterprise agents...
Action: read_webpage
Result: [Article content]
--- Step 4 ---
Thought: I have enough information. Let me synthesize and save...
Action: write_report
Result: Report saved to ai-agents-2024.md
--- Step 5 ---
FINAL_ANSWER: I've researched AI agents in production and saved a report to ai-agents-2024.md. Key findings: [summary]
Taking It Further
This is a minimal agent. Production agents need:
- Better tool schemas — OpenAI function calling or similar
- Memory — Persistent storage across sessions
- Planning — Multi-step plan before execution
- Reflection — Self-critique and improvement
- Guardrails — Limits on what tools can do
But the core loop stays the same: Observe → Think → Act → Reflect.
Conclusion
Building an agent is simpler than frameworks make it seem. The key components are:
- Tools the agent can use
- A prompt that teaches tool use
- A parser for agent responses
- A loop that executes until done
- Error handling for robustness
Start simple. Add complexity only when needed.
What will your first agent do?