Skip to main content

Command Palette

Search for a command to run...

Let's make our Agent

An overview on Agents and how they work

Updated
7 min read
Let's make our Agent

What is an Agent?

An agent is something that can automatically perform tasks, reason, and generate results.

We have seen that LLM models and AIs are like brains that can think, reason, and answer questions. But this isn't very practical on its own. What would you do with just that? Create a chatbot and chat all day?

Here comes the concept of AI Agents, which is much broader than just LLMs! The main idea is to give some actions to the LLM models, like giving hands and legs to the AI so they can do their tasks.

So, the official definition of AI Agents is:
“An AI agent is a software system designed to interact with its environment, gather information, and perform tasks autonomously to achieve predetermined goals set by humans or other systems.

How does it work?

Ok, we are done with the definition. Now comes the part of its mechanism. According to some resources, there are five core components of AI Agents:

  1. Perception System: Agents receive input from users or sensors. (Generally, the user query)

  2. Reasoning Engine: The LLM that processes information and makes decisions. (The AI models)

  3. Tool Use: The ability to call external functions. (We will get back to it.)

  4. Decision Framework: some structured workflow: plan → action → observe → output

    1. Plan: Decides what to do based on the query

    2. Action: Calls appropriate functions with specific parameters

    3. Observe: Processes the results from function calls

    4. Output: Provides final responses to users

  5. Memory: The agent maintains conversation history to track context.

This is the basic workflow of an agent. Now, let’s make a simple weather agent of our own :)

Our First AI Agent:

I am gonna explain the code step by step. Don’t forget to read the comments.

Just some general imports:

import os
import json
import requests
from google import genai
from dotenv import load_dotenv

load_dotenv()
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
client = genai.Client(api_key=GEMINI_API_KEY)
#just install the pacakages/dependencies in your code file
  1. Perception system: Input from the user.

     user_query = input('> ')
    
  2. Reasoning Engine: I am using the Gemini API here. (It’s kind of free :) )

     system_prompt = f"""
         You're a helpful AI assistant who is specialized in resolving user query.
         You work on plan, action, observe, output mode.
    
         Available tools: {list(available_functions.keys())}
         Tool descriptions:
         - get_weather(city: str): Returns weather information for a given city
    
         IMPORTANT RULES:
         - Return ONLY ONE step per response, not multiple steps
         - Start with "plan" step first
         - Wait for next input before proceeding to next step
         - When step is "action", you MUST specify function and input
    
         Output JSON Format (return only ONE):
         {{
             "step": "plan|action|observe|output",
             "content": "description of what you're doing",
             "function": "function name (only for action step)",
             "input": "function parameter (only for action step)"
         }}
    
         User Query: """
    

    This is the system prompt of the system, which will decide where to execute what part. In a word, will reason the whole process based on the user query.

  3. Tool Use: I am using a weather API, and the LLM will call the API when needed.

     ## This is the main part for the function call
     def get_weather(city: str):
         response = requests.get(f"https://wttr.in/{city}?format=%C:%t")
    
         if(response.status_code == 200):
             data = response.text.split(':')
             situation = data[0]
             temp = data[1]
             return f"weather situation: {situation} and temparature: {temp}"
         else:
             print("API failed to get weather data")
    
     ## This is the object for function listing. Notice in the systemp_prompt, I am listing the available functions there
     available_functions = {
         "get_weather": get_weather
     }
    
  4. Decision Framework + Memorising:

    Observe the code closely.

     while step_count < max_steps:
             ## Memorising the previous prompts
             full_prompt = system_prompt + user_query + "\n" + conversation_history
    
             response = client.models.generate_content(
                 model="gemini-2.0-flash-001",
                 contents=full_prompt
             )
    
             print(f"AI Response: {response.text}\n")
    
             parsed = parse_response(response.text) ## An additional function for parsing, will give it later.
             if not parsed:
                 print("Failed to parse response")
                 break
    
             ## Here, the main game begins. First the function checks the step name and its content. 
             ## Based on the name and content, it decides its action.
             ## If step is calling for an action aka. funciton, it calls a function
             ## If step is calling for output, it stops.
             ## No action for plan and observe, as it will just be handled and will do nothing (printing it though)
    
             step = parsed.get("step")
             content = parsed.get("content")
    
             if step == "action":
                 function_name = parsed.get("function")
                 function_input = parsed.get("input")
    
                 if function_name in available_functions:
                     result = available_functions[function_name](function_input)
                     observation = f"Function {function_name} returned: {result}"
                     print(f"Function Call: {function_name} ('{function_input}')")
                     print(f"Result: {result}\n")
    
                     conversation_history += f"\nObservation: {observation}"
    
                 else:
                     print(f"Function {function_name} not available")
                     break
    
             elif step == "output":
                 print("=== FINAL ANSWER ===")
                 print(content)
                 break
    
             conversation_history += f"\nstep: {response.text}"
             step_count += 1
    
         if step_count >= max_steps:
             print("Maximum steps reached")
    

    Like the explanation (comments in the code), the code executes the decision framework nicely.

Yeah, this is the main workflow of an agent. Now, this is the whole code:

import os
import json
import requests
from google import genai
from dotenv import load_dotenv

load_dotenv()
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
client = genai.Client(api_key=GEMINI_API_KEY)

def get_weather(city: str):
    response = requests.get(f"https://wttr.in/{city}?format=%C:%t")

    if(response.status_code == 200):
        data = response.text.split(':')
        situation = data[0]
        temp = data[1]
        return f"weather situation: {situation} and temparature: {temp}"
    else:
        print("API failed to get weather data")

available_functions = {
    "get_weather": get_weather
}


system_prompt = f"""
    You're a helpful AI assistant who is specialized in resolving user query.
    You work on plan, action, observe, output mode.

    Available tools: {list(available_functions.keys())}
    Tool descriptions:
    - get_weather(city: str): Returns weather information for a given city

    IMPORTANT RULES:
    - Return ONLY ONE step per response, not multiple steps
    - Start with "plan" step first
    - Wait for next input before proceeding to next step
    - When step is "action", you MUST specify function and input

    Output JSON Format (return only ONE):
    {{
        "step": "plan|action|observe|output",
        "content": "description of what you're doing",
        "function": "function name (only for action step)",
        "input": "function parameter (only for action step)"
    }}

    User Query: """


def parse_response(response_text):
    """Extract JSON from the response"""
    try:
        text = response_text.replace("```json", "").replace('```', "")
        lines = text.strip().split('\n')

        for line in lines:
            line = line.strip()
            if line.startswith('{') and line.endswith('}'):
                try:
                    return json.loads(line)
                except:
                    continue
        start = text.find('{')
        if start != -1:
            brace_count = 0
            for i, char in enumerate(text[start:], start):
                if char == '{':
                    brace_count += 1
                elif char == '}':
                    brace_count -= 1
                    if brace_count == 0:
                        json_str = text[start: i+1]
                        return json.loads(json_str)
    except Exception as e:
        print(f"Parse error: {e}")
        pass

    return None

def run_agent(user_query):
    conversation_history = ""
    step_count = 0
    max_steps = 100

    print(f"User Query: {user_query}\n")

    while step_count < max_steps:
        full_prompt = system_prompt + user_query + "\n" + conversation_history

        response = client.models.generate_content(
            model="gemini-2.0-flash-001",
            contents=full_prompt
        )

        print(f"AI Response: {response.text}\n")

        parsed = parse_response(response.text)
        if not parsed:
            print("Failed to parse response")
            break

        step = parsed.get("step")
        content = parsed.get("content")

        if step == "action":
            function_name = parsed.get("function")
            function_input = parsed.get("input")

            if function_name in available_functions:
                result = available_functions[function_name](function_input)
                observation = f"Function {function_name} returned: {result}"
                print(f"Function Call: {function_name} ('{function_input}')")
                print(f"Result: {result}\n")

                conversation_history += f"\nObservation: {observation}"

            else:
                print(f"Function {function_name} not available")
                break

        elif step == "output":
            print("=== FINAL ANSWER ===")
            print(content)
            break

        conversation_history += f"\nstep: {response.text}"
        step_count += 1

    if step_count >= max_steps:
        print("Maximum steps reached")

user_query = input('> ')

if __name__ == "__main__":
    user_query = user_query
    run_agent(user_query)

Sample Input:

> What is the weather in Satkhira?

Sample Output:

User Query: What is the weather in Satkhira?

AI Response: ```json
{
        "step": "plan",
        "content": "I need to get the weather information for Satkhira. I will use the get_weather tool to get the weather information.",   
        "function": null,
        "input": null
}
```


AI Response: ```json
{
        "step": "action",
        "content": "Get weather information for Satkhira",
        "function": "get_weather",
        "input": "Satkhira"
}
```

Function Call: get_weather ('Satkhira')
Result: weather situation: Overcast and temparature: +31°C

AI Response: step: ```json
{
        "step": "observe",
        "content": "The weather in Satkhira is Overcast and the temperature is +31°C.",
        "function": null,
        "input": null
}
```

AI Response: ```json
{
        "step": "output",
        "content": "The weather in Satkhira is Overcast and the temperature is +31°C.",
        "function": null,
        "input": null
}
```

=== FINAL ANSWER ===
The weather in Satkhira is Overcast and the temperature is +31°C.

The agent will extract the city name and pass it to the get_weather function, then extract the result from the API, and show the result.

Conclusion:

Seems pretty normal, right? This is just one example of how Agents work. Try giving more complex queries, like finding the average temperature of three districts or extracting the temperatures of several districts and displaying them in a table. Then you'll realize how much more powerful it is compared to a regular API call and query. Just imagine the possibilities if the query is done on a database or the entire Internet.

Tour with GenAI

Part 11 of 12

This series explores how LLMs like ChatGPT go beyond chat, diving into automation, from sending requests to getting intelligent responses. Learn how real-world LLM-powered systems are built behind the scenes.

Up next

Different Prompting Styles

A Look at Different Types of Prompting Techniques