Lecture 4: Search in Artificial Intelligence

Author

Your Name

Published

February 10, 2025

Introduction

Course Overview and Topic: Search

Welcome to the fourth lecture of our Artificial Intelligence course. Today, we embark on a detailed exploration of search, a core concept in the field of AI. Search is not just a single lecture topic; it is a fundamental problem-solving technique that will be central to our discussions in this and subsequent lectures. Before we dive into the intricacies of search algorithms and strategies, we will briefly address a few administrative items and recap the essential concepts covered in our previous sessions to ensure everyone is well-prepared for the material ahead.

Course Administration

Action Item: Course Questionnaire

For those who have not yet done so, please complete the course questionnaire. Your responses are invaluable in helping me understand your backgrounds, expectations, and learning preferences. This is particularly important for students who are attending remotely or watching recordings, as your feedback ensures that the course remains relevant and beneficial for everyone. Your input helps shape the course to better meet your needs.

Resource: Guidelines for Effective Scientific Writing

I would like to bring to your attention a resource that many find helpful: guidelines on how to write effectively in scientific contexts, with a particular focus on avoiding common pitfalls in thesis writing and general scientific documents. These guidelines offer advice applicable to all forms of academic communication, from formal theses to informal emails and messages on platforms like Teams. Please take some time to review these materials. Adhering to these suggestions will significantly improve the clarity and effectiveness of your written communication throughout this course and beyond.

Recap of Foundational Concepts

As we transition into the more technical aspects of AI, it’s crucial to ensure we are all on the same page regarding the foundational concepts we’ve discussed so far. We are still in the initial, introductory phase of the course, building the necessary groundwork for more advanced topics.

Delving into the Definition of Artificial Intelligence

In the first two lectures, we dedicated significant time to dissecting and defining Artificial Intelligence. We approached this task methodically, aiming to arrive at a comprehensive and nuanced understanding of what AI truly encompasses. This involved exploring various perspectives and interpretations to establish a robust working definition for our course.

Historical Journey Through Artificial Intelligence

We also undertook a historical survey of Artificial Intelligence, tracing its evolution from its conceptual birth in the 1950s to its current state of rapid advancement. We highlighted key milestones, breakthroughs, and periods of both excitement and stagnation. Understanding this historical context is essential for appreciating the current challenges and future directions of AI research and development.

Agents: Intelligent and Rational Entities

Our ongoing discussion centers on the concept of an agent. We are working towards a precise characterization of agents, particularly focusing on intelligent and rational agents. We are exploring what it means for an agent to be intelligent—capable of learning and problem-solving—and rational—acting in a way that is expected to achieve the best outcome, or, when there is uncertainty, the best expected outcome. This characterization is fundamental as it sets the stage for understanding how we design and evaluate AI systems.

Task Environments: Defining the Agent’s Arena

We introduced the notion of a task environment, which is critical for situating an agent and understanding its goals and constraints. The task environment encompasses everything external to the agent that is relevant to its performance. It defines not only the task that the agent is designed to perform but also the environment in which this task is carried out. We began to examine the properties that characterize task environments, a discussion we will continue in this lecture to fully appreciate how these properties influence the design of intelligent agents. Understanding the task environment is paramount because it dictates the challenges an agent must overcome and the types of capabilities it must possess to act effectively.

Task Environment Properties

We are currently examining the properties of task environments, which are crucial for understanding the challenges and constraints agents face when we design them. As discussed in the previous lecture, the nature of the task environment profoundly influences the agent’s design and its approach to problem-solving. We have identified several key dimensions that help characterize these environments, allowing us to better understand and categorize the challenges they present.

Dimensions of Task Environments

The task environment can be characterized along several dimensions, each representing a spectrum of possibilities. These dimensions are not mutually exclusive but rather provide a multi-faceted view of the environment, significantly impacting the design and capabilities required for intelligent agents operating within them. Let’s explore each dimension in detail:

Observable vs. Partially Observable Environments

The first dimension is observability. A task environment is considered observable (or fully observable) if the agent’s sensors provide it with access to the complete state of the environment at any given point in time. In such environments, the agent has all the necessary information to make optimal decisions based on the current state. Conversely, an environment is partially observable if the agent can only perceive a portion of the state, or if some crucial aspects are hidden from its sensors.

Example 1 (Observable vs. Partially Observable Environments). Consider two scenarios:

Observable Environment: A chess game where the agent (a chess-playing program) has access to the entire board configuration. It knows the position of all pieces, and thus the environment is fully observable.
Partially Observable Environment: Playing poker. An agent playing poker cannot see the cards held by other players. It must make decisions based on its own cards, the community cards, and the betting behavior of opponents, which only provide partial information about the actual state of the game. Similarly, in a game of solitaire, as mentioned, face-down cards exemplify partial observability, even though the rules of the game are completely known. Another example is a self-driving car in foggy conditions. Sensors might be obscured, leading to incomplete information about surrounding objects and road conditions.

The degree of observability significantly affects the complexity of the agent’s task. In partially observable environments, agents often need to maintain an internal state to keep track of aspects of the environment that are not directly perceived, inferring the hidden parts of the state from past perceptions and a model of the world.

Single Agent vs. Multi-Agent Environments

The second dimension concerns the number of agents in the environment. A task environment can be a single-agent environment, where only one agent operates and acts upon the environment. In this case, the agent’s primary concern is its own performance, without needing to explicitly consider the actions or goals of other agents. On the other hand, a multi-agent environment involves more than one agent. These agents can be cooperative, competitive, or somewhere in between. In multi-agent settings, agents must consider the actions of others, which can significantly complicate the task.

Example 2 (Single Agent vs. Multi-Agent Environments).

Single Agent Environment: Vacuum cleaning in a house. A single vacuum cleaner robot operates in the environment to clean the floors. It acts alone, and its success is measured by how well it cleans, without direct interaction or competition with other agents.
Multi-Agent Environment: A soccer game. Multiple agents (players) interact in the same environment, with each team competing against the other. Agents must coordinate with teammates and compete with opponents to achieve their goals. Another example is a traffic intersection controlled by multiple self-driving cars. Each car (agent) needs to navigate safely and efficiently, considering the actions of other cars to avoid collisions and optimize traffic flow.

As mentioned in the transcript, medical diagnosis is typically viewed as a single-agent environment, with a doctor (the agent) diagnosing a patient (part of the environment). However, this perspective can shift. If we consider the patient’s emotional state, their adherence to treatment plans, or interactions with other healthcare providers (nurses, specialists), the scenario can evolve into a multi-agent system. In such cases, the patient, other medical staff, or even administrative systems could be considered agents with their own objectives and performance measures, influencing the primary agent’s (doctor’s) task. The choice of framing an environment as single or multi-agent often depends on the level of abstraction and the specific aspects of the problem being modeled.

Deterministic vs. Non-Deterministic Environments

The determinism of an environment refers to the predictability of state transitions. In a deterministic environment, the outcome of an action is uniquely determined by the current state and the action itself. For every action, there is only one possible resulting state. In contrast, a non-deterministic environment (also known as stochastic environment) is one where an action’s outcome is not fully predictable. An action taken in a particular state might result in one of several possible states, with probabilities associated with each outcome.

Example 3 (Deterministic vs. Non-Deterministic Environments).

Deterministic Environment: A vacuum cleaning robot in a simple grid world where the action "move forward" always results in moving one cell forward in the intended direction, provided there is no obstacle. The outcome is predictable and consistent.
Non-Deterministic Environment: Driving a car in winter conditions. Applying the brakes (action) might lead to different outcomes depending on the road conditions (icy, snowy, wet). The car might stop as expected, skid, or even lose control. Similarly, in robotics, attempting to pick up an object might succeed or fail due to uncertainties in sensor readings, motor control, or the object’s position and stability.

Dealing with non-deterministic environments requires agents to be more robust and to plan for contingencies. They may need to use probabilistic models to predict outcomes and choose actions that maximize expected utility across possible scenarios.

Episodic vs. Sequential Environments

This dimension distinguishes between environments based on the dependency between experiences. In an episodic environment, the agent’s experience is divided into independent episodes. Each episode consists of the agent perceiving the environment and then performing a single action. Crucially, the next episode is entirely independent of the actions taken in previous episodes. The agent’s task in each episode is isolated. In a sequential environment, however, the current decision affects not only the immediate outcome but also future episodes. Actions taken in the present can have long-term consequences, and the history of actions is important.

Example 4 (Episodic vs. Sequential Environments).

Episodic Environment: Classifying images of objects. Each image classification task is an episode. The agent receives an image, classifies it, and gets feedback. The correctness of classifying one image does not affect the next image classification task.
Sequential Environment: Playing a game of chess. Each move in chess is an action, and the sequence of moves is critical. A move made early in the game can significantly impact the possibilities and outcomes in later stages. Similarly, in long-term investment management, each investment decision is an action that has consequences over time, affecting future investment opportunities and overall portfolio performance.

Agents in sequential environments must consider the future implications of their current actions, often requiring planning and long-term strategies. The concept of memory and the ability to learn from past experiences become crucial in sequential tasks.

Static vs. Dynamic Environments

The static vs. dynamic dimension describes whether the environment can change while the agent is deliberating or acting. A static environment is unchanging except by the agent’s actions. If the agent is considering what action to take, the environment remains constant during this period of deliberation. In contrast, a dynamic environment can change independently of the agent’s actions. Time is a significant factor in dynamic environments, as the environment can evolve even as the agent is deciding on its next step.

A further refinement is the concept of a semi-dynamic environment. This is a type of dynamic environment where the environment itself does not change with time, but the performance score of the agent does.

Example 5 (Static, Dynamic, and Semi-Dynamic Environments).

Static Environment: Solving a crossword puzzle. While you are thinking about which word to fill in, the crossword puzzle itself does not change. The only changes are due to your actions of filling in words.
Dynamic Environment: Playing a real-time strategy game. While a player is planning their next move, the game environment continues to evolve—opponents make moves, resources change, and new events occur. A self-driving car in city traffic is also in a dynamic environment; other vehicles, pedestrians, and traffic signals are constantly changing, requiring the agent to react in real-time.
Semi-Dynamic Environment: Playing a turn-based game with a clock, like chess with time limits. The environment (the chessboard) is static while the agent deliberates, but the agent’s performance score (related to winning within the time limit) is dynamic as time is elapsing, regardless of the agent’s actions.

Dynamic environments often impose real-time constraints on agents, requiring them to make decisions quickly and react to changes promptly. In static environments, agents can afford to spend more time deliberating and planning.

Discrete vs. Continuous Environments

The discrete vs. continuous dimension refers to the nature of the state space and the action space. An environment is discrete if it has a countable number of distinct states and actions. The states and actions can be enumerated. A continuous environment, on the other hand, involves states and actions that are continuous, meaning they can take on values from a continuous range (e.g., real numbers).

Example 6 (Discrete vs. Continuous Environments).

Discrete Environment: A chessboard. There is a finite number of board configurations (states) and a finite set of moves (actions) at each state. Many board games, navigation in a grid, and automated manufacturing tasks are often modeled as discrete environments.
Continuous Environment: Driving a car in the real world. The car’s position, speed, and steering angle are continuous variables. Actions like steering and acceleration can also be applied continuously. Controlling a chemical plant where temperature, pressure, and flow rates are continuously adjustable and measurable is another example of a continuous environment.

Discrete environments are generally simpler for agents to handle because the number of possibilities is limited, which allows for techniques like exhaustive search or discrete optimization. Continuous environments require more sophisticated mathematical tools and computational methods to manage the infinite possibilities in state and action spaces. Often, agents in continuous environments behave more like controllers, as mentioned in the transcript, drawing from principles of cybernetics to manage dynamic systems.

Known vs. Unknown Environments

The final dimension is about the agent’s knowledge of the environment. A known environment is one where the agent (or the system designer) has complete knowledge of the environment’s rules, physics, and dynamics. The agent knows how the environment works and the consequences of its actions. In contrast, in an unknown environment, the agent lacks complete knowledge about one or more aspects of the environment. This could include the transition model (how actions affect the environment), the reward function, or even the set of possible states and actions.

Example 7 (Known vs. Unknown Environments).

Known Environment: Solving a well-defined maze. The agent has a map of the maze, knows the effects of actions (move forward, turn left, turn right), and knows the goal location. The environment’s rules are fully specified and known.
Unknown Environment: Exploring a new planet with a robot. The robot might not know the terrain properties, the effects of its movements on the surface, or the locations of resources or hazards. It must explore and learn about the environment as it interacts with it. Playing a new video game where the player initially does not know what each button does is another example. The player must experiment to discover the game’s mechanics and rules.

As highlighted in the transcript, this dimension is somewhat different because it is not an intrinsic property of the environment itself but rather a property of the relationship between the agent (or designer) and the environment. An environment might be inherently deterministic and fully observable, but if the agent does not have complete knowledge of its workings, it is considered unknown from the agent’s perspective.

Furthermore, the performance measure itself can also be unknown. For example, in designing a recommendation system, the ideal performance measure might depend on subjective user preferences that are not initially known to the system designer. Understanding what truly makes a user "content" with recommendations can be complex and might require iterative refinement and learning from user interactions. This is why, in practice, evaluating recommendation systems is not straightforward, as different users have varying and sometimes evolving preferences.

In unknown environments, agents often need to engage in exploration to learn about the environment. This learning process might involve trial and error, experimentation, and building models of the environment based on interactions. As mentioned, a child touching a candle flame, despite the immediate negative outcome, learns valuable information about the world, improving future performance by avoiding similar actions. This exploration-exploitation trade-off is a key challenge in designing agents for unknown environments.

Impact of Task Environment Properties on Agent Design

The properties of the task environment have profound implications for the design of intelligent agents. Understanding these properties helps in selecting appropriate agent architectures, algorithms, and learning strategies. Here are some key consequences:

Partial Observability Implies Memory and Inference: In partially observable environments, agents cannot rely solely on current perceptions. They must maintain an internal state that incorporates a history of perceptions to infer the unobserved parts of the current state. This often leads to the design of model-based agents that can predict future states based on past experiences. For instance, a vacuum cleaner agent, knowing it has already cleaned a certain area, can infer that the area is likely still clean, even if it cannot directly sense cleanliness at every moment.
Non-Determinism Requires Contingency Planning: In stochastic or non-deterministic environments, agents must be prepared to handle multiple possible outcomes of their actions. This necessitates planning for contingencies and using probabilistic reasoning. Agents might need to evaluate actions based on expected outcomes and probabilities, rather than certain outcomes.
Multi-Agent Settings Demand Strategic Interactions: When multiple agents are present, especially in competitive environments, agents must adopt strategies that consider the potential actions of other agents. This might involve game theory concepts, randomized strategies to avoid predictability, and negotiation or cooperation strategies in collaborative settings. The behavior of other agents becomes a critical factor in an agent’s decision-making process.
Dynamic Environments Impose Time Constraints: Dynamic environments often require agents to act quickly and efficiently. Deliberation time becomes a critical resource. Agents might need to use reactive strategies or algorithms that can provide timely responses. In contrast, static environments allow for more extensive deliberation and planning.
Continuous Environments Favor Control-Theoretic Approaches: As noted, continuous environments often lead to agent designs that resemble controllers. Techniques from cybernetics and control theory, which deal with continuous-time systems and feedback loops, become relevant. Agents might employ continuous actions and monitor continuous state variables, adjusting their actions in real-time to maintain desired states or trajectories.
Unknown Environments Necessitate Learning and Exploration: When the environment is unknown, agents must be capable of learning. This involves exploration to gather information about the environment, model learning to build representations of the environment’s dynamics, and reinforcement learning to improve performance based on feedback. The agent must balance exploration (trying new actions to learn) with exploitation (using known actions to perform well).
Unknown Performance Measures Require Human Input or Iterative Refinement: If the performance measure is not well-defined or depends on subjective factors (like user satisfaction in a recommendation system), agent design might require incorporating human feedback, using datasets labeled by humans, or employing iterative refinement processes. Techniques like reinforcement learning with human-in-the-loop feedback or learning from user data become essential to align the agent’s objectives with the desired performance.

In summary, the characteristics of the task environment are not just descriptive labels; they are fundamental determinants of the challenges and opportunities an agent faces. By carefully analyzing these properties, we can design more effective and intelligent agents tailored to specific tasks and environments. The dimensions discussed provide a framework for understanding the complexities of different AI problems and for guiding the development of appropriate solutions. As we move forward, we will see how these environmental properties directly influence the choice of search algorithms and agent architectures we employ.

Types of Agents

Having discussed the various dimensions of task environments, we now turn our attention to the agents themselves. Agents can be categorized based on their architecture, complexity, and capabilities. We will explore four fundamental types of agents, each building upon the last in terms of sophistication: simple reflex agents, model-based reflex agents, goal-based agents, and utility-based agents. We will also discuss learning agents, which represent a further evolution in agent design, incorporating the ability to improve performance over time. It is important to note that for any given problem, the simplest agent type capable of achieving satisfactory performance should be preferred, adhering to principles of efficiency and parsimony in design.

Simple Reflex Agents

Reactive Behavior and Condition-Action Rules

Simple reflex agents are the most basic form of intelligent agents. These agents operate on a purely reactive principle, making decisions based solely on the current percept, disregarding any history of past perceptions. Their action selection is governed by a set of predefined condition-action rules. These rules, also known as situation-action rules, production rules, or IF-THEN rules, directly map a perceived state to an action.

Condition-action rules are fundamental to simple reflex agents. They embody a direct mapping from a perceived condition to a specific action. The general form is:

IF condition THEN action

These rules are typically implemented as a set of IF-THEN statements or in a rule-based system. The agent checks if the current percept matches the condition part of a rule, and if it does, it executes the corresponding action.

Illustrative Example: The Vacuum Cleaner Agent

Consider our recurring example of a vacuum cleaner agent operating in an environment with two locations, A and B. The agent’s task is to clean up dirt. A simple reflex agent for this task might employ the following rules:

Rule 1: IF Location A is dirty THEN Suck.
Rule 2: IF Location B is dirty THEN Suck.
Rule 3: IF Current location is A AND Location B is clean THEN Move to B.
Rule 4: IF Current location is B AND Location A is clean THEN Move to A.

In operation, the agent continuously senses its environment. If it perceives dirt in its current location, it immediately performs the ‘Suck’ action. If there is no dirt in the current location, it checks its location and the cleanliness of the other location to decide whether to move. This agent reacts directly to its immediate surroundings based on the rules programmed into it.

Algorithm for a Simple Reflex Agent

A generalized algorithm for a simple reflex agent can be described as follows:

Algorithm 1 (). Input: Percept $p$
Persistent: Rules: a set of condition-action rules

$state \leftarrow InterpretInput(p)$ $rule \leftarrow RuleMatch(state, rules)$ $action \leftarrow RuleAction[rule]$ return $action$

Complexity Analysis: The complexity of a simple reflex agent is primarily determined by the number of rules and the complexity of the RuleMatch function. In the simplest case, assuming a linear scan through the rules, the time complexity for action selection is $O(R \times C)$, where $R$ is the number of rules and $C$ is the average complexity of evaluating a condition. However, if rule matching is optimized (e.g., using indexing), the complexity can be reduced. The space complexity is $O(R \times S)$, where $S$ is the average size of a rule, to store the rules.

Limitations in Partially Observable Environments

Despite their simplicity and effectiveness in certain scenarios, simple reflex agents suffer from significant limitations, particularly in partially observable environments. Because they make decisions based only on the current percept, they are fundamentally unable to account for aspects of the environment they cannot currently sense. They lack any memory or internal state to keep track of the history of the environment.

In scenarios where the agent needs to consider past events to make informed decisions, or when the full state of the environment is not directly accessible, simple reflex agents will perform sub-optimally. For instance, if our vacuum cleaner agent cannot sense whether it has already cleaned a certain area, it might revisit and clean the same area repeatedly, even if other areas remain dirty. This lack of historical awareness and inability to infer unperceived aspects of the environment severely restricts their applicability in more complex and realistic settings.

Model-Based Reflex Agents

Incorporating Internal State and World Model

To address the shortcomings of simple reflex agents in partially observable environments, model-based reflex agents are introduced. These agents enhance the reactive approach by maintaining an internal state and a world model. The internal state is a representation of the current state of the environment, which is not fully observable from the current percept alone. It is built and updated over time based on the history of perceptions. The world model is knowledge about how the environment evolves, independent of the agent’s actions, and how the agent’s actions affect the environment.

Internal State: Agent’s memory of past perceptions, used to represent aspects of the current state that are not directly perceivable.
World Model: Knowledge about how the environment works, including:
- Transition Model: How the environment evolves and changes over time, including the effects of the agent’s actions.
- Sensor Model: How the agent’s sensors perceive the environment, including their limitations and potential inaccuracies.

Leveraging Transition and Sensor Models

The transition model allows the agent to predict future states of the environment. Given the current state and an action, the transition model forecasts the resulting state. This predictive capability is crucial for planning and decision-making, especially in sequential environments. The sensor model, on the other hand, provides information about the reliability and accuracy of the agent’s sensors. It helps the agent understand what aspects of the world are being perceived and how accurately.

State Update Mechanism

A critical aspect of model-based agents is their ability to update their internal state. This is typically done in each step by:

Perceiving the environment through sensors.
Updating the internal state based on the new percept, the previous state, the transition model, and the sensor model. This update process often involves inference to estimate the current state, especially when the environment is only partially observable.
Selecting an action based on the updated internal state and condition-action rules, similar to simple reflex agents, but now operating on a more informed state representation.
Executing the action in the environment.

This cycle allows the agent to maintain a dynamic representation of the world, even when direct sensory information is incomplete. By incorporating a model of how the world changes and how its sensors work, the agent can make more informed decisions than a simple reflex agent.

Algorithm for a Model-Based Reflex Agent

The algorithm for a model-based reflex agent expands upon the simple reflex agent by including state maintenance and model utilization:

Algorithm 2 (). Input: Percept $p$
Persistent:

State: current internal state description
Rules: a set of condition-action rules
Transition Model: model of how actions affect the world
Sensor Model: model of sensor capabilities
Last Action: the last action executed (optional, for state update)

$state \leftarrow UpdateState(state, lastAction, p, transitionModel, sensorModel)$ $rule \leftarrow RuleMatch(state, rules)$ $action \leftarrow RuleAction[rule]$ $lastAction \leftarrow action$ return $action$

Complexity Analysis: In addition to the complexities of rule matching similar to simple reflex agents, model-based agents introduce complexity in state updating. The UpdateState function’s complexity depends on the nature of the state representation and the models. In the worst case, updating the state might involve complex inference or probabilistic calculations, potentially increasing the time complexity significantly. Thespace complexity increases to accommodate the state, transition model, and sensor model, which can be substantial depending on the environment’s complexity and the chosen representation.

Goal-Based Agents

Explicit Goal Representation and Achievement

Goal-based agents represent a further step up in agent sophistication. While model-based reflex agents can reason about the state of the world, goal-based agents additionally consider explicit goals that they are trying to achieve. A goal is a desirable state or situation that the agent aims to reach. The agent’s decision-making process is then guided by these goals. Instead of just reacting to the current state, goal-based agents consider what actions will lead them closer to their goals.

Goal-based agents are characterized by their explicit representation of goals. These goals provide a direction for the agent’s actions. The agent’s primary objective is to select actions that will lead to the achievement of these goals. This involves:

Goal Formulation: Defining what states or conditions constitute a goal.
Goal Achievement: Planning and executing sequences of actions to reach a goal state.

Planning Action Sequences for Goal Attainment

Goal-based agents are particularly useful when achieving a goal requires a sequence of actions, rather than a single action. They engage in planning, which involves searching through possible action sequences to find a path that leads from the current state to a goal state. This planning process often utilizes the agent’s world model to predict the outcomes of different action sequences and evaluate their effectiveness in reaching the goal.

For example, consider an agent tasked with navigating from one location to another in a city. A goal-based agent would not just react to immediate surroundings but would plan a route, considering a sequence of turns and movements to reach the destination. This might involve using search algorithms to explore different paths and choose the most efficient one.

Limitations: Handling Conflicting Goals and Uncertainty

Despite their advanced capabilities compared to reflex agents, goal-based agents have limitations. One significant limitation is in dealing with conflicting goals. If an agent has multiple goals that are not mutually achievable, or if achieving one goal hinders the achievement of another, goal-based agents lack a mechanism to prioritize or balance these goals. They simply aim to achieve the specified goals without inherent preferences or trade-offs.

Another limitation is in handling uncertainty. Goal-based agents typically operate in a binary mode: a goal is either achieved or not. They do not inherently account for the degree of goal achievement or the probability of achieving a goal in uncertain environments. For instance, if an agent has a goal to buy milk at a supermarket, a goal-based agent might not consider the possibility that the supermarket might be out of milk. It would simply plan to go to the supermarket and assume the goal will be achieved upon arrival. In reality, there’s always a chance of failure or varying degrees of success, which goal-based agents, in their basic form, do not address.

Utility-Based Agents

Utility Function for Performance Measurement

Utility-based agents overcome the limitations of goal-based agents by introducing the concept of utility. Instead of merely distinguishing between goal states and non-goal states, utility-based agents use a utility function to assign a numerical value, or utility, to each state. This utility value represents the degree of "happiness" or "desirability" associated with being in that state, providing a more nuanced measure of performance than simple goal achievement.

Utility function is a cornerstone of utility-based agents. It provides a quantitative measure of preference over states. For each state, the utility function assigns a real number that reflects the agent’s preference for that state. Higher utility values indicate more desirable states. This allows agents to:

Compare States: Determine which state is better or more desirable.
Make Trade-offs: Decide between actions that might lead to different outcomes with varying degrees of desirability.
Handle Conflicting Objectives: Balance multiple objectives by considering their relative utilities.

Expected Utility for Decision-Making under Uncertainty

Utility-based agents are particularly adept at handling uncertainty through the concept of expected utility. When faced with actions that have probabilistic outcomes, utility-based agents calculate the expected utility of each action. This is done by considering all possible outcome states, their associated utilities, and the probability of reaching each outcome. The expected utility of an action is the average of the utilities of its possible outcomes, weighted by their probabilities.

Expected utility of an action is calculated as follows:

Mathematically, if an action $a$ in state $s$ can lead to states $s_1, s_2, ..., s_n$ with probabilities $P(s_1|s, a), P(s_2|s, a), ..., P(s_n|s, a)$ respectively, and the utility of each state is $U(s_1), U(s_2), ..., U(s_n)$, then the expected utility of action $a$ in state $s$, $EU(a|s)$, is given by:

\[EU(a|s) = \sum_{i=1}^{n} P(s_i|s, a) \times U(s_i)\]

The agent then chooses the action that maximizes its expected utility. This approach allows utility-based agents to make rational decisions in uncertain environments, balancing potential rewards against risks.

Advantages over Goal-Based Agents

Utility-based agents offer several advantages over goal-based agents:

Preference Handling: They can express preferences among different goal states. For example, if there are multiple ways to achieve a goal, a utility function can specify which way is preferred (e.g., a shorter path is preferred over a longer path).
Trade-offs Between Goals: They can handle trade-offs between multiple, potentially conflicting goals. By assigning utilities to different outcomes, the agent can choose actions that best balance competing objectives. For example, an agent might need to balance the goal of buying milk with the goal of avoiding traffic, and utility function can help in making this trade-off.
Uncertainty Management: They can make rational decisions in uncertain environments by maximizing expected utility, allowing them to choose actions that are likely to lead to better outcomes on average, even if the outcomes are not guaranteed.

Utility-based agents represent a significant advancement in agent design, enabling more flexible, robust, and rational behavior in complex and uncertain environments. They are particularly relevant in scenarios where outcomes are not binary (success or failure) but exist on a spectrum of desirability.

Learning Agents

The Capacity to Learn and Improve

The agent types discussed so far—reflex, model-based, goal-based, and utility-based—are primarily concerned with how agents make decisions given their current knowledge and capabilities. Learning agents, however, go a step further by incorporating the ability to learn and improve their performance over time. Learning agents can adapt to new environments, refine their strategies, and become more effective at achieving their goals through experience.

Learning agents are distinguished by their ability to learn from experience. This learning capability allows them to:

Adapt to New Environments: Adjust their behavior and strategies in response to changes in the environment.
Improve Performance: Enhance their effectiveness and efficiency in achieving goals over time.
Acquire New Knowledge: Discover new rules, models, or utility functions through interaction with the environment.

Architectural Components of a Learning Agent

A learning agent can be conceptually decomposed into four essential components, as illustrated in 1:

Architecture of a Learning Agent

Performance Element

The performance element is responsible for selecting and executing actions. It is essentially the problem-solving component of the agent and can be implemented using any of the agent types we have discussed previously (simple reflex, model-based, goal-based, or utility-based). The performance element focuses on doing the task as effectively as possible, given its current knowledge.

Learning Element

The learning element is the core of the learning agent. Its role is to improve the performance element by learning from experience. It takes feedback from the critic and suggestions from the problem generator to modify the performance element. The learning element is responsible for updating the agent’s knowledge, rules, models, or utility functions.

Critic and Feedback

The critic evaluates the performance of the agent and provides feedback to the learning element. This feedback is crucial for learning. The critic compares the agent’s behavior against a performance standard and provides signals indicating how well the agent is doing. This feedback can be in various forms, such as rewards for good actions, penalties for bad actions, or evaluations of the agent’s state. As mentioned in the transcript, the critic often involves external input, either from a human expert or a dataset that provides labeled examples of good and bad actions or outcomes.

Problem Generator and Exploration

The problem generator is responsible for exploration. It suggests actions that might lead to new and informative experiences. The problem generator encourages the agent to step outside its current strategy and try new things, even if these actions might not be optimal in the short term. By exploring, the agent can discover new aspects of the environment, identify better strategies, and improve its long-term performance. The problem generator helps balance exploration with exploitation, ensuring that the agent not only performs well based on current knowledge but also continuously seeks to expand and refine that knowledge.

The Role of Machine Learning

Machine learning is the set of techniques and algorithms that empower learning agents. It provides the tools for the learning element to process feedback, update internal models, and improve decision-making strategies. As noted in the transcript, modern AI is heavily reliant on machine learning, particularly deep learning, for creating intelligent systems. Alan Turing, as early as 1950, envisioned that instead of programming intelligent agents directly, it might be more effective to create agents that can learn for themselves. This vision is now a reality, with machine learning driving advancements in virtually all areas of AI. Learning agents are capable of operating in environments that are too complex or too poorly understood to be programmed directly, making them essential for tackling real-world AI challenges.

In the subsequent lectures, while we will primarily focus on search algorithms within simpler agent architectures, it is crucial to remember that learning is the ultimate goal for creating truly intelligent and adaptable systems. The principles of search we will discuss lay the groundwork for more advanced learning techniques that build upon these foundations.

Defining Search Problems

We now transition to a more formal and structured approach to problem-solving in AI by defining search problems. Search is a fundamental technique in artificial intelligence, providing a systematic way to find solutions to a wide array of problems. To effectively utilize search algorithms, we must first clearly define the problem in a way that is amenable to computational methods. This involves identifying the key components that constitute a search problem and understanding how they interact.

Formalizing Search Problems

To apply search algorithms, we need a precise, formal definition of a search problem. This formalization allows us to use computational methods to explore potential solutions systematically. A well-defined search problem breaks down the challenge into manageable components, making it possible to design and implement algorithms that can find solutions efficiently.

Elements of a Search Problem (6-tuple)

A search problem is conventionally defined as a 6-tuple. This tuple encapsulates all the necessary information to describe the problem and its solution space. The six essential elements are:

State Space
Initial State
Actions
Transition Model
Goal Test
Cost Function

Let’s delve into each of these components to understand their role in defining a search problem.

State Space ($\mathcal{S}$)

Definition 1 (State Space). The state space, denoted as $\mathcal{S}$, is the set of all possible states in which the agent and its environment can exist. A state is a complete description of the configuration of the problem at a particular point in time.

Remark. Remark 1 (Granularity of State Space). The granularity of the state space is critical; it should be detailed enough to solve the problem but abstract enough to keep the search manageable.

Example 8 (State Space in Pac-Man). In a simplified Pac-Man game on a 3x3 grid with pellets, a state is not just Pac-Man’s location. It includes:

Pac-Man’s position (e.g., grid coordinates).
The location of all remaining pellets on the grid (present or absent).
Optionally, Pac-Man’s direction if it’s relevant to the problem definition.

Thus, a state in this Pac-Man example is a combination of Pac-Man’s location and the configuration of pellets. Each unique arrangement of these elements constitutes a distinct state within the state space. It’s important to note, as highlighted in the transcript, that a state is not merely a location but a comprehensive snapshot of the problem’s configuration.

Initial State ($s_0$ or $S_{initial}$)

Definition 2 (Initial State). The initial state, denoted as $s_0$ or $S_{initial}$, is the state in which the agent starts its problem-solving journey. It is the starting point from which the search process begins. There is only one initial state in any search problem.

Example 9 (Initial State in Romania Travel). In the Romania travel problem, if we start our journey in the city of Arad, then Arad is the initial state. All search paths will originate from this state as we attempt to reach our destination, Bucharest.

Actions ($\mathcal{A}(s)$)

Definition 3 (Actions). Actions represent the possible moves or operations that an agent can perform when it is in a state $s$. For each state $s$ in the state space $\mathcal{S}$, there is a set of applicable actions, denoted as $\mathcal{A}(s)$. These actions define how the agent can transition from one state to another.

Example 10 (Actions in Romania Travel). In the Romania travel problem, if we are in the city of Sibiu, the possible actions are to drive to any adjacent city. Based on the Romania map, the set of actions from Sibiu, $\mathcal{A}(\text{Sibiu})$, would be:

Drive to Arad
Drive to Fagaras
Drive to Oradea
Drive to Rimnicu Vilcea

Each of these actions represents a possible move from the current state (Sibiu) to a neighboring state (city).

Transition Model ($\text{Result}(s, a)$)

Definition 4 (Transition Model). The transition model, denoted as $\text{Result}(s, a)$, describes the outcome of performing an action $a$ in a state $s$. It defines the next state that the agent will transition to after executing action $a$ in state $s$.

Remark. Remark 2 (Deterministic vs. Non-deterministic Environments). In a deterministic environment, the transition model is a function that uniquely determines the next state. For a given state and action, there is only one resulting state. In non-deterministic environments, the transition model might specify a probability distribution over possible next states. However, for the initial discussion of search problems, we often assume deterministic environments for simplicity.

Example 11 (Transition Model in Romania Travel). Consider the action "Drive to Zerind" from the initial state "Arad". In a deterministic model, the transition model would specify: \[\text{Result}(\text{Arad}, \text{DriveToZerind}) = \text{Zerind}\] This indicates that performing the action "Drive to Zerind" in the state "Arad" deterministically leads to the state "Zerind".

Goal Test ($\text{GoalTest}(s)$)

Definition 5 (Goal Test). The goal test, denoted as $\text{GoalTest}(s)$, is a function that determines whether a given state $s$ is a goal state. It defines the condition for success in the search problem. The goal test function takes a state as input and returns true if the state is a goal state, and false otherwise.

Remark. Remark 3 (Multiple Goal States). There can be one or more goal states in a search problem. Instead of specifying a single goal state, using a goal test function allows for defining a set of goal states based on properties or conditions.

Example 12 (Goal Test in Romania Travel). In the Romania travel problem, the goal is to reach Bucharest. The goal test function would be: \[\text{GoalTest}(s) = \begin{cases} \text{true} & \text{if } s = \text{Bucharest} \\ \text{false} & \text{otherwise} \end{cases}\] For any state $s$, $\text{GoalTest}(s)$ checks if $s$ is the city of Bucharest. If it is, the function returns true, indicating that we have reached a goal state.

Cost Function ($\text{Cost}(s, a, s')$ or Path Cost)

Definition 6 (Cost Function). The cost function, denoted as $\text{Cost}(s, a, s')$, assigns a numerical cost to the path from state $s$ to state $s'$ via action $a$. It measures the "expense" of taking action $a$ in state $s$ to reach the successor state $s'$.

Remark. Remark 4 (Optimal Solution). The cost function can represent various measures, such as time, distance, resources used, or penalties incurred. In many problems, the cost is associated with each action, and the total cost of a path is the sum of the costs of the actions in the path. We aim to find a solution with the minimum total cost, which is known as an optimal solution.

Example 13 (Cost Function in Romania Travel). In the Romania travel problem, the cost function can be the distance between cities. If driving from Arad to Zerind has a distance of 75 km, then: \[\text{Cost}(\text{Arad}, \text{DriveToZerind}, \text{Zerind}) = 75\] The cost function quantifies the expense of moving from one city to another, which we want to minimize to find the shortest route to Bucharest.

Illustrative Examples of Search Problems

To solidify our understanding of these elements, let’s revisit and elaborate on the examples mentioned earlier, explicitly defining each component of the 6-tuple for each problem.

Example 1: Romania Travel to Bucharest

State Space: $\mathcal{S} = \{\text{Arad, Bucharest, ..., Zerind}\}$ - The set of cities in Romania as depicted on the map. Each city is a state.
Initial State: $s_0 = \text{Arad}$ - The starting city of the journey.
Actions: For each city $c$, $\mathcal{A}(c) = \{\text{Drive to } c' | c' \text{ is adjacent to } c \text{ on the map}\}$ - Driving from the current city to any adjacent city.
Transition Model: $\text{Result}(c, \text{Drive to } c') = c'$ - Driving from city $c$ to city $c'$ results in being in city $c'$ (deterministic).
Goal Test: $\text{GoalTest}(c) = (\text{Is } c = \text{Bucharest?})$ - Checks if the current city is Bucharest.
Cost Function: $\text{Cost}(c, \text{Drive to } c', c') = \text{Distance between } c \text{ and } c'$ - The cost of driving from city $c$ to $c'$ is the given road distance between them.

A solution to this problem is a sequence of cities starting from Arad and ending in Bucharest. An optimal solution is the path (sequence of cities) with the minimum total distance, which is the sum of the costs of each action (driving between cities) along the path.

Example 2: Vacuum Cleaner Problem Revisited

State Space: $\mathcal{S} = \{(L, D_A, D_B) | L \in \{A, B\}, D_A \in \{\text{Clean, Dirty}\}, D_B \in \{\text{Clean, Dirty}\}\}$ - States are defined by the vacuum cleaner’s location (A or B) and the dirt status of both locations (Clean or Dirty). There are $2 \times 2 \times 2 = 8$ possible states.
Initial State: e.g., $s_0 = (A, \text{Dirty}, \text{Dirty})$ - Vacuum cleaner starts at location A, both locations are initially dirty.
Actions: $\mathcal{A}(s) = \{\text{MoveLeft, MoveRight, Suck}\}$ - The agent can move left, move right, or suck dirt. Actions are applicable in every state.
Transition Model: For example:
- $\text{Result}((A, \text{Dirty}, D_B), \text{Suck}) = (A, \text{Clean}, D_B)$ - Sucking at location A, if dirty, makes location A clean.
- $\text{Result}((A, D_A, D_B), \text{MoveRight}) = (B, D_A, D_B)$ - Moving right from A takes the agent to location B, dirt statuses remain unchanged.
Goal Test: $\text{GoalTest}((L, \text{Clean}, \text{Clean})) = \text{true}$ for any location $L \in \{A, B\}$ - The goal is achieved when both locations A and B are clean, regardless of the vacuum cleaner’s location.
Cost Function: $\text{Cost}(s, a, s') = 1$ for all states $s, s'$ and actions $a$ - Each action (move or suck) costs 1 unit.

Example 3: 8-Puzzle Problem

State Space: $\mathcal{S} = \{\text{All possible configurations of 8 tiles and a blank space in a 3x3 grid}\}$ - Each configuration of the 8-puzzle is a state. The number of possible states is 9!/2 = 181,440.
Initial State: A given scrambled configuration of the 8-puzzle tiles.
Actions: $\mathcal{A}(s) = \{\text{Move blank tile Up, Down, Left, Right}\}$ - Move the blank tile up, down, left, or right, if the move is within the grid boundaries.
Transition Model: $\text{Result}(s, \text{Move blank tile Up})$ = $s'$ - Moving the blank tile up in state $s$ results in a new state $s'$ with tiles rearranged.
Goal Test: $\text{GoalTest}(s) = (\text{Is } s = \text{goal configuration?})$ - Checks if the current configuration matches the predefined goal configuration (tiles in order, blank in a specific position).
Cost Function: $\text{Cost}(s, a, s') = 1$ for all valid moves - Each move of a tile costs 1 unit.

Example 4: Tic-Tac-Toe Game

State Space: $\mathcal{S} = \{\text{All possible Tic-Tac-Toe board configurations}\}$ - Includes boards with X’s, O’s, and empty squares. The game states evolve as players make moves.
Initial State: $s_0 = \text{Empty 3x3 board}$ - The game starts with an empty board.
Actions: $\mathcal{A}(s) = \{\text{Place X or O in an empty square}\}$ - For each state, the available actions are to place a player’s mark (X or O, depending on whose turn it is) in any empty square.
Transition Model: $\text{Result}(s, \text{Place } X \text{ in square } (i, j)) = s'$ - Placing an X in square (i, j) in state $s$ results in a new board state $s'$ with the X in that position.
Goal Test: $\text{GoalTest}(s) = (\text{Is } s \text{ a terminal state?})$ - Checks if the game has ended. Terminal states are when one player has won (three in a row), or it’s a draw (all squares filled, no winner).
Cost Function: In game playing, the cost is often related to the outcome (win, lose, draw). In a search formulation, we might consider the path length to a winning state or minimize the number of moves to win, effectively setting $\text{Cost}(s, a, s') = 1$ for each move, and aiming to find a path to a winning state with minimal moves.

The Challenge of State Space Explosion

Combinatorial Complexity and Intractability

Remark. Remark 5 (State Space Explosion Problem). A major hurdle in solving search problems, especially for complex real-world scenarios, is the state space explosion problem. This issue arises because the number of possible states can grow exponentially with the problem’s dimensions or complexity. As the state space becomes larger, the computational resources (time and memory) required to explore and find a solution increase dramatically, often rendering exhaustive search methods infeasible. This combinatorial complexity makes it computationally intractable to explore the entire state space for many problems of practical interest.

Illustrative Examples of State Space Size

The magnitude of the state space explosion is evident in several examples:

8-Puzzle: As mentioned, the 8-puzzle has approximately 181,440 reachable states. While this number is considerable, it is still manageable for many search algorithms on modern computers.
15-Puzzle: Increasing the puzzle size to a 15-puzzle (4x4 grid with 15 tiles and a blank) dramatically increases the state space to about $10^{13}$ states (over ten trillion). Searching through such a vast space exhaustively becomes significantly more challenging.
Chess: The game of Chess exemplifies extreme state space explosion. The estimated number of states in a typical chess game is around $10^{44}$. This astronomical number makes it impossible to enumerate or explore the entire state space. Even with advanced search algorithms and heuristics, chess programs only explore a tiny fraction of the state space.
Pac-Man (Complex Version): In a more complex Pac-Man game, considering Pac-Man’s position, direction, the locations and states of all pellets, and the positions and behaviors of ghosts, the state space can quickly become unmanageably large. As the transcript example illustrated, even a simplified Pac-Man environment can lead to a state space that is computationally challenging to explore fully. For instance, with 120 possible positions for Pac-Man, $2^{30}$ pellet configurations, and considering ghost positions and Pac-Man’s direction, the total number of states becomes immense and impractical to handle without efficient search strategies and approximations.

Remark. Remark 6 (Necessity of Heuristics). This combinatorial explosion necessitates the development and application of efficient search algorithms and heuristics. Heuristics are problem-specific rules of thumb that guide the search process, helping to prioritize exploration in promising directions and prune away less promising paths. Without such techniques, solving complex search problems within reasonable time and resource constraints would be impossible.

In the following lectures, we will explore various search algorithms and heuristic methods designed to navigate large state spaces effectively and find optimal or near-optimal solutions.

State Space Graphs

Graphical Representation of Search Problems

Search problems can be visually and conceptually represented using state space graphs. A state space graph is a graph where:

Nodes as States and Edges as Actions

Nodes in the graph correspond to states in the state space. Each state is represented by a unique node.
Edges in the graph represent actions that transition the agent from one state to another. Edges are typically directed, indicating the direction of the action’s effect.

Action Costs on Edges

Remark. Remark 7 (Action Costs on Edges in State Space Graphs). Edges can be labeled with action costs, representing the cost of performing the corresponding action to move between states. This cost is derived from the cost function defined in the search problem.

Solutions as Paths in the Graph

Definition 7 (Solution in State Space Graph). A solution to a search problem in a state space graph is a path from the initial state node to a goal state node. The path is a sequence of nodes connected by edges, representing a sequence of actions.

Definition 8 (Optimal Solution in State Space Graph). An optimal solution is the path with the minimum total cost, where the total cost is the sum of the costs of all edges along the path.

Example: Romania Map as State Space Graph

Example 14 (Romania Map as State Space Graph). In the Romania travel problem, the map of Romania can be directly seen as a state space graph:

Cities are nodes.
Roads between cities are edges.
Distances between cities are edge costs.

The initial node is Arad, and the goal node is Bucharest. Finding a route from Arad to Bucharest is equivalent to finding a path in this graph, and finding the shortest route is finding the minimum cost path.

Conclusion

In this lecture, we have progressed from reviewing the basics of intelligent agents and task environments to formally defining search problems. We explored the dimensions of task environments and their impact on agent design, and categorized different types of agents based on their complexity and capabilities, from simple reflex agents to learning agents. We then introduced the concept of a search problem, defined by a 6-tuple including state space, initial state, actions, transition model, goal test, and cost function, and illustrated these with examples like Romania travel, vacuum cleaner, 8-puzzle, and Tic-Tac-Toe. We highlighted the state space explosion problem, emphasizing the combinatorial complexity inherent in many search problems. Finally, we introduced state space graphs as a graphical representation of search problems, where states are nodes and actions are edges, setting the stage for exploring search algorithms in the upcoming lectures.

Key Takeaways:

Task Environment Properties Matter: The characteristics of the task environment (observability, determinism, etc.) fundamentally determine the design and complexity of intelligent agents.
Agent Types Reflect Increasing Sophistication: Agent architectures range from simple reflex-based systems to complex utility-based and learning agents, each suited to different environmental complexities and task requirements.
Formal Search Problems Enable Systematic Solutions: Defining a problem as a 6-tuple (state space, initial state, actions, transition model, goal test, cost function) provides a structured framework for applying search algorithms.
State Space Graphs Visualize Search Landscapes: Representing search problems as graphs, with states as nodes and actions as edges, offers an intuitive and analytical approach to problem-solving.
State Space Explosion is a Core Challenge: The exponential growth of state spaces in complex problems necessitates efficient search algorithms and heuristic methods to find solutions practically.

Looking Ahead: Building upon the foundations laid in this lecture, our next session will delve into the algorithms designed to solve search problems. We will explore fundamental search techniques that navigate state space graphs to find solutions efficiently.

Follow-up Questions and Topics for the Next Lecture:

Efficient Search Strategies: How can we design algorithms to efficiently find solutions in vast state spaces, mitigating the challenges of state space explosion?
Fundamental Search Algorithms: What are the basic search algorithms used in AI, such as Breadth-First Search (BFS), Depth-First Search (DFS), and Uniform Cost Search (UCS), and how do they operate?
Heuristics for Informed Search: How can we incorporate heuristics—problem-specific knowledge—to guide search algorithms and significantly improve their efficiency and effectiveness?
Search in Non-Deterministic Environments: How can we adapt search techniques to handle non-deterministic environments where actions have uncertain outcomes, moving beyond the deterministic models we’ve initially considered?

--- title: "Lecture 4: Search in Artificial Intelligence" author: "Your Name" date: "2025-02-10" format: html: toc: true # Table of Contents toc-depth: 2 code-tools: true theme: cosmo # Or "journal" for Distill-like minimalism --- # Introduction ## Course Overview and Topic: Search Welcome to the fourth lecture of our Artificial Intelligence course. Today, we embark on a detailed exploration of **search**, a core concept in the field of AI. Search is not just a single lecture topic; it is a fundamental problem-solving technique that will be central to our discussions in this and subsequent lectures. Before we dive into the intricacies of search algorithms and strategies, we will briefly address a few administrative items and recap the essential concepts covered in our previous sessions to ensure everyone is well-prepared for the material ahead. ## Course Administration ### Action Item: Course Questionnaire For those who have not yet done so, please complete the course questionnaire. Your responses are invaluable in helping me understand your backgrounds, expectations, and learning preferences. This is particularly important for students who are attending remotely or watching recordings, as your feedback ensures that the course remains relevant and beneficial for everyone. Your input helps shape the course to better meet your needs. ### Resource: Guidelines for Effective Scientific Writing I would like to bring to your attention a resource that many find helpful: guidelines on how to write effectively in scientific contexts, with a particular focus on avoiding common pitfalls in thesis writing and general scientific documents. These guidelines offer advice applicable to all forms of academic communication, from formal theses to informal emails and messages on platforms like Teams. Please take some time to review these materials. Adhering to these suggestions will significantly improve the clarity and effectiveness of your written communication throughout this course and beyond. ## Recap of Foundational Concepts As we transition into the more technical aspects of AI, it's crucial to ensure we are all on the same page regarding the foundational concepts we've discussed so far. We are still in the initial, introductory phase of the course, building the necessary groundwork for more advanced topics. ### Delving into the Definition of Artificial Intelligence In the first two lectures, we dedicated significant time to dissecting and defining Artificial Intelligence. We approached this task methodically, aiming to arrive at a comprehensive and nuanced understanding of what AI truly encompasses. This involved exploring various perspectives and interpretations to establish a robust working definition for our course. ### Historical Journey Through Artificial Intelligence We also undertook a historical survey of Artificial Intelligence, tracing its evolution from its conceptual birth in the 1950s to its current state of rapid advancement. We highlighted key milestones, breakthroughs, and periods of both excitement and stagnation. Understanding this historical context is essential for appreciating the current challenges and future directions of AI research and development. ### Agents: Intelligent and Rational Entities Our ongoing discussion centers on the concept of an **agent**. We are working towards a precise characterization of agents, particularly focusing on **intelligent** and **rational** agents. We are exploring what it means for an agent to be intelligent---capable of learning and problem-solving---and rational---acting in a way that is expected to achieve the best outcome, or, when there is uncertainty, the best expected outcome. This characterization is fundamental as it sets the stage for understanding how we design and evaluate AI systems. ### Task Environments: Defining the Agent's Arena We introduced the notion of a **task environment**, which is critical for situating an agent and understanding its goals and constraints. The task environment encompasses everything external to the agent that is relevant to its performance. It defines not only the **task** that the agent is designed to perform but also the **environment** in which this task is carried out. We began to examine the properties that characterize task environments, a discussion we will continue in this lecture to fully appreciate how these properties influence the design of intelligent agents. Understanding the task environment is paramount because it dictates the challenges an agent must overcome and the types of capabilities it must possess to act effectively. # Task Environment Properties We are currently examining the properties of task environments, which are crucial for understanding the challenges and constraints agents face when we design them. As discussed in the previous lecture, the nature of the task environment profoundly influences the agent's design and its approach to problem-solving. We have identified several key dimensions that help characterize these environments, allowing us to better understand and categorize the challenges they present. ## Dimensions of Task Environments The task environment can be characterized along several dimensions, each representing a spectrum of possibilities. These dimensions are not mutually exclusive but rather provide a multi-faceted view of the environment, significantly impacting the design and capabilities required for intelligent agents operating within them. Let's explore each dimension in detail: ### Observable vs. Partially Observable Environments {#subsec:observable_environment} The first dimension is observability. A task environment is considered **observable** (or fully observable) if the agent's sensors provide it with access to the complete state of the environment at any given point in time. In such environments, the agent has all the necessary information to make optimal decisions based on the current state. Conversely, an environment is **partially observable** if the agent can only perceive a portion of the state, or if some crucial aspects are hidden from its sensors. ::: example **Example 1** (Observable vs. Partially Observable Environments). Consider two scenarios: - **Observable Environment:** A chess game where the agent (a chess-playing program) has access to the entire board configuration. It knows the position of all pieces, and thus the environment is fully observable. - **Partially Observable Environment:** Playing poker. An agent playing poker cannot see the cards held by other players. It must make decisions based on its own cards, the community cards, and the betting behavior of opponents, which only provide partial information about the actual state of the game. Similarly, in a game of solitaire, as mentioned, face-down cards exemplify partial observability, even though the rules of the game are completely known. Another example is a self-driving car in foggy conditions. Sensors might be obscured, leading to incomplete information about surrounding objects and road conditions. ::: The degree of observability significantly affects the complexity of the agent's task. In partially observable environments, agents often need to maintain an internal state to keep track of aspects of the environment that are not directly perceived, inferring the hidden parts of the state from past perceptions and a model of the world. ### Single Agent vs. Multi-Agent Environments {#subsec:single_multi_agent_environment} The second dimension concerns the number of agents in the environment. A task environment can be a **single-agent environment**, where only one agent operates and acts upon the environment. In this case, the agent's primary concern is its own performance, without needing to explicitly consider the actions or goals of other agents. On the other hand, a **multi-agent environment** involves more than one agent. These agents can be cooperative, competitive, or somewhere in between. In multi-agent settings, agents must consider the actions of others, which can significantly complicate the task. ::: example **Example 2** (Single Agent vs. Multi-Agent Environments). - **Single Agent Environment:** Vacuum cleaning in a house. A single vacuum cleaner robot operates in the environment to clean the floors. It acts alone, and its success is measured by how well it cleans, without direct interaction or competition with other agents. - **Multi-Agent Environment:** A soccer game. Multiple agents (players) interact in the same environment, with each team competing against the other. Agents must coordinate with teammates and compete with opponents to achieve their goals. Another example is a traffic intersection controlled by multiple self-driving cars. Each car (agent) needs to navigate safely and efficiently, considering the actions of other cars to avoid collisions and optimize traffic flow. ::: As mentioned in the transcript, medical diagnosis is typically viewed as a single-agent environment, with a doctor (the agent) diagnosing a patient (part of the environment). However, this perspective can shift. If we consider the patient's emotional state, their adherence to treatment plans, or interactions with other healthcare providers (nurses, specialists), the scenario can evolve into a multi-agent system. In such cases, the patient, other medical staff, or even administrative systems could be considered agents with their own objectives and performance measures, influencing the primary agent's (doctor's) task. The choice of framing an environment as single or multi-agent often depends on the level of abstraction and the specific aspects of the problem being modeled. ### Deterministic vs. Non-Deterministic Environments {#subsec:deterministic_environment} The determinism of an environment refers to the predictability of state transitions. In a **deterministic environment**, the outcome of an action is uniquely determined by the current state and the action itself. For every action, there is only one possible resulting state. In contrast, a **non-deterministic environment** (also known as stochastic environment) is one where an action's outcome is not fully predictable. An action taken in a particular state might result in one of several possible states, with probabilities associated with each outcome. ::: example **Example 3** (Deterministic vs. Non-Deterministic Environments). - **Deterministic Environment:** A vacuum cleaning robot in a simple grid world where the action \"move forward\" always results in moving one cell forward in the intended direction, provided there is no obstacle. The outcome is predictable and consistent. - **Non-Deterministic Environment:** Driving a car in winter conditions. Applying the brakes (action) might lead to different outcomes depending on the road conditions (icy, snowy, wet). The car might stop as expected, skid, or even lose control. Similarly, in robotics, attempting to pick up an object might succeed or fail due to uncertainties in sensor readings, motor control, or the object's position and stability. ::: Dealing with non-deterministic environments requires agents to be more robust and to plan for contingencies. They may need to use probabilistic models to predict outcomes and choose actions that maximize expected utility across possible scenarios. ### Episodic vs. Sequential Environments {#subsec:episodic_environment} This dimension distinguishes between environments based on the dependency between experiences. In an **episodic environment**, the agent's experience is divided into independent episodes. Each episode consists of the agent perceiving the environment and then performing a single action. Crucially, the next episode is entirely independent of the actions taken in previous episodes. The agent's task in each episode is isolated. In a **sequential environment**, however, the current decision affects not only the immediate outcome but also future episodes. Actions taken in the present can have long-term consequences, and the history of actions is important. ::: example **Example 4** (Episodic vs. Sequential Environments). - **Episodic Environment:** Classifying images of objects. Each image classification task is an episode. The agent receives an image, classifies it, and gets feedback. The correctness of classifying one image does not affect the next image classification task. - **Sequential Environment:** Playing a game of chess. Each move in chess is an action, and the sequence of moves is critical. A move made early in the game can significantly impact the possibilities and outcomes in later stages. Similarly, in long-term investment management, each investment decision is an action that has consequences over time, affecting future investment opportunities and overall portfolio performance. ::: Agents in sequential environments must consider the future implications of their current actions, often requiring planning and long-term strategies. The concept of memory and the ability to learn from past experiences become crucial in sequential tasks. ### Static vs. Dynamic Environments {#subsec:static_environment} The static vs. dynamic dimension describes whether the environment can change while the agent is deliberating or acting. A **static environment** is unchanging except by the agent's actions. If the agent is considering what action to take, the environment remains constant during this period of deliberation. In contrast, a **dynamic environment** can change independently of the agent's actions. Time is a significant factor in dynamic environments, as the environment can evolve even as the agent is deciding on its next step. A further refinement is the concept of a **semi-dynamic environment**. This is a type of dynamic environment where the environment itself does not change with time, but the performance score of the agent does. ::: example **Example 5** (Static, Dynamic, and Semi-Dynamic Environments). - **Static Environment:** Solving a crossword puzzle. While you are thinking about which word to fill in, the crossword puzzle itself does not change. The only changes are due to your actions of filling in words. - **Dynamic Environment:** Playing a real-time strategy game. While a player is planning their next move, the game environment continues to evolve---opponents make moves, resources change, and new events occur. A self-driving car in city traffic is also in a dynamic environment; other vehicles, pedestrians, and traffic signals are constantly changing, requiring the agent to react in real-time. - **Semi-Dynamic Environment:** Playing a turn-based game with a clock, like chess with time limits. The environment (the chessboard) is static while the agent deliberates, but the agent's performance score (related to winning within the time limit) is dynamic as time is elapsing, regardless of the agent's actions. ::: Dynamic environments often impose real-time constraints on agents, requiring them to make decisions quickly and react to changes promptly. In static environments, agents can afford to spend more time deliberating and planning. ### Discrete vs. Continuous Environments {#subsec:discrete_environment} The discrete vs. continuous dimension refers to the nature of the state space and the action space. An environment is **discrete** if it has a countable number of distinct states and actions. The states and actions can be enumerated. A **continuous environment**, on the other hand, involves states and actions that are continuous, meaning they can take on values from a continuous range (e.g., real numbers). ::: example **Example 6** (Discrete vs. Continuous Environments). - **Discrete Environment:** A chessboard. There is a finite number of board configurations (states) and a finite set of moves (actions) at each state. Many board games, navigation in a grid, and automated manufacturing tasks are often modeled as discrete environments. - **Continuous Environment:** Driving a car in the real world. The car's position, speed, and steering angle are continuous variables. Actions like steering and acceleration can also be applied continuously. Controlling a chemical plant where temperature, pressure, and flow rates are continuously adjustable and measurable is another example of a continuous environment. ::: Discrete environments are generally simpler for agents to handle because the number of possibilities is limited, which allows for techniques like exhaustive search or discrete optimization. Continuous environments require more sophisticated mathematical tools and computational methods to manage the infinite possibilities in state and action spaces. Often, agents in continuous environments behave more like controllers, as mentioned in the transcript, drawing from principles of cybernetics to manage dynamic systems. ### Known vs. Unknown Environments {#subsec:known_environment} The final dimension is about the agent's knowledge of the environment. A **known environment** is one where the agent (or the system designer) has complete knowledge of the environment's rules, physics, and dynamics. The agent knows how the environment works and the consequences of its actions. In contrast, in an **unknown environment**, the agent lacks complete knowledge about one or more aspects of the environment. This could include the transition model (how actions affect the environment), the reward function, or even the set of possible states and actions. ::: example **Example 7** (Known vs. Unknown Environments). - **Known Environment:** Solving a well-defined maze. The agent has a map of the maze, knows the effects of actions (move forward, turn left, turn right), and knows the goal location. The environment's rules are fully specified and known. - **Unknown Environment:** Exploring a new planet with a robot. The robot might not know the terrain properties, the effects of its movements on the surface, or the locations of resources or hazards. It must explore and learn about the environment as it interacts with it. Playing a new video game where the player initially does not know what each button does is another example. The player must experiment to discover the game's mechanics and rules. ::: As highlighted in the transcript, this dimension is somewhat different because it is not an intrinsic property of the environment itself but rather a property of the relationship between the agent (or designer) and the environment. An environment might be inherently deterministic and fully observable, but if the agent does not have complete knowledge of its workings, it is considered unknown from the agent's perspective. Furthermore, the performance measure itself can also be unknown. For example, in designing a recommendation system, the ideal performance measure might depend on subjective user preferences that are not initially known to the system designer. Understanding what truly makes a user \"content\" with recommendations can be complex and might require iterative refinement and learning from user interactions. This is why, in practice, evaluating recommendation systems is not straightforward, as different users have varying and sometimes evolving preferences. In unknown environments, agents often need to engage in exploration to learn about the environment. This learning process might involve trial and error, experimentation, and building models of the environment based on interactions. As mentioned, a child touching a candle flame, despite the immediate negative outcome, learns valuable information about the world, improving future performance by avoiding similar actions. This exploration-exploitation trade-off is a key challenge in designing agents for unknown environments. ## Impact of Task Environment Properties on Agent Design {#subsec:impact_environment_agent_design} The properties of the task environment have profound implications for the design of intelligent agents. Understanding these properties helps in selecting appropriate agent architectures, algorithms, and learning strategies. Here are some key consequences: - **Partial Observability Implies Memory and Inference:** In partially observable environments, agents cannot rely solely on current perceptions. They must maintain an internal state that incorporates a history of perceptions to infer the unobserved parts of the current state. This often leads to the design of model-based agents that can predict future states based on past experiences. For instance, a vacuum cleaner agent, knowing it has already cleaned a certain area, can infer that the area is likely still clean, even if it cannot directly sense cleanliness at every moment. - **Non-Determinism Requires Contingency Planning:** In stochastic or non-deterministic environments, agents must be prepared to handle multiple possible outcomes of their actions. This necessitates planning for contingencies and using probabilistic reasoning. Agents might need to evaluate actions based on expected outcomes and probabilities, rather than certain outcomes. - **Multi-Agent Settings Demand Strategic Interactions:** When multiple agents are present, especially in competitive environments, agents must adopt strategies that consider the potential actions of other agents. This might involve game theory concepts, randomized strategies to avoid predictability, and negotiation or cooperation strategies in collaborative settings. The behavior of other agents becomes a critical factor in an agent's decision-making process. - **Dynamic Environments Impose Time Constraints:** Dynamic environments often require agents to act quickly and efficiently. Deliberation time becomes a critical resource. Agents might need to use reactive strategies or algorithms that can provide timely responses. In contrast, static environments allow for more extensive deliberation and planning. - **Continuous Environments Favor Control-Theoretic Approaches:** As noted, continuous environments often lead to agent designs that resemble controllers. Techniques from cybernetics and control theory, which deal with continuous-time systems and feedback loops, become relevant. Agents might employ continuous actions and monitor continuous state variables, adjusting their actions in real-time to maintain desired states or trajectories. - **Unknown Environments Necessitate Learning and Exploration:** When the environment is unknown, agents must be capable of learning. This involves exploration to gather information about the environment, model learning to build representations of the environment's dynamics, and reinforcement learning to improve performance based on feedback. The agent must balance exploration (trying new actions to learn) with exploitation (using known actions to perform well). - **Unknown Performance Measures Require Human Input or Iterative Refinement:** If the performance measure is not well-defined or depends on subjective factors (like user satisfaction in a recommendation system), agent design might require incorporating human feedback, using datasets labeled by humans, or employing iterative refinement processes. Techniques like reinforcement learning with human-in-the-loop feedback or learning from user data become essential to align the agent's objectives with the desired performance. In summary, the characteristics of the task environment are not just descriptive labels; they are fundamental determinants of the challenges and opportunities an agent faces. By carefully analyzing these properties, we can design more effective and intelligent agents tailored to specific tasks and environments. The dimensions discussed provide a framework for understanding the complexities of different AI problems and for guiding the development of appropriate solutions. As we move forward, we will see how these environmental properties directly influence the choice of search algorithms and agent architectures we employ. # Types of Agents Having discussed the various dimensions of task environments, we now turn our attention to the agents themselves. Agents can be categorized based on their architecture, complexity, and capabilities. We will explore four fundamental types of agents, each building upon the last in terms of sophistication: simple reflex agents, model-based reflex agents, goal-based agents, and utility-based agents. We will also discuss learning agents, which represent a further evolution in agent design, incorporating the ability to improve performance over time. It is important to note that for any given problem, the simplest agent type capable of achieving satisfactory performance should be preferred, adhering to principles of efficiency and parsimony in design. ## Simple Reflex Agents ### Reactive Behavior and Condition-Action Rules **Simple reflex agents** are the most basic form of intelligent agents. These agents operate on a purely reactive principle, making decisions based solely on the current percept, disregarding any history of past perceptions. Their action selection is governed by a set of predefined **condition-action rules**. These rules, also known as situation-action rules, production rules, or IF-THEN rules, directly map a perceived state to an action. ::: tcolorbox **Condition-action rules** are fundamental to simple reflex agents. They embody a direct mapping from a perceived condition to a specific action. The general form is: **IF** *condition* **THEN** *action* These rules are typically implemented as a set of IF-THEN statements or in a rule-based system. The agent checks if the current percept matches the condition part of a rule, and if it does, it executes the corresponding action. ::: ### Illustrative Example: The Vacuum Cleaner Agent Consider our recurring example of a vacuum cleaner agent operating in an environment with two locations, A and B. The agent's task is to clean up dirt. A simple reflex agent for this task might employ the following rules: - **Rule 1:** IF *Location A is dirty* THEN *Suck*. - **Rule 2:** IF *Location B is dirty* THEN *Suck*. - **Rule 3:** IF *Current location is A* AND *Location B is clean* THEN *Move to B*. - **Rule 4:** IF *Current location is B* AND *Location A is clean* THEN *Move to A*. In operation, the agent continuously senses its environment. If it perceives dirt in its current location, it immediately performs the 'Suck' action. If there is no dirt in the current location, it checks its location and the cleanliness of the other location to decide whether to move. This agent reacts directly to its immediate surroundings based on the rules programmed into it. ### Algorithm for a Simple Reflex Agent A generalized algorithm for a simple reflex agent can be described as follows: :::: tcolorbox ::: algorithm **Algorithm 1** (). **Input:** Percept $p$\ **Persistent:** Rules: a set of condition-action rules $state \leftarrow InterpretInput(p)$ $rule \leftarrow RuleMatch(state, rules)$ $action \leftarrow RuleAction[rule]$ **return** $action$ ::: :::: **Complexity Analysis:** The complexity of a simple reflex agent is primarily determined by the number of rules and the complexity of the `RuleMatch` function. In the simplest case, assuming a linear scan through the rules, the time complexity for action selection is $O(R \times C)$, where $R$ is the number of rules and $C$ is the average complexity of evaluating a condition. However, if rule matching is optimized (e.g., using indexing), the complexity can be reduced. The space complexity is $O(R \times S)$, where $S$ is the average size of a rule, to store the rules. ### Limitations in Partially Observable Environments Despite their simplicity and effectiveness in certain scenarios, simple reflex agents suffer from significant limitations, particularly in **partially observable environments**. Because they make decisions based only on the current percept, they are fundamentally unable to account for aspects of the environment they cannot currently sense. They lack any memory or internal state to keep track of the history of the environment. In scenarios where the agent needs to consider past events to make informed decisions, or when the full state of the environment is not directly accessible, simple reflex agents will perform sub-optimally. For instance, if our vacuum cleaner agent cannot sense whether it has already cleaned a certain area, it might revisit and clean the same area repeatedly, even if other areas remain dirty. This lack of historical awareness and inability to infer unperceived aspects of the environment severely restricts their applicability in more complex and realistic settings. ## Model-Based Reflex Agents ### Incorporating Internal State and World Model To address the shortcomings of simple reflex agents in partially observable environments, **model-based reflex agents** are introduced. These agents enhance the reactive approach by maintaining an **internal state** and a **world model**. The **internal state** is a representation of the current state of the environment, which is not fully observable from the current percept alone. It is built and updated over time based on the history of perceptions. The **world model** is knowledge about how the environment evolves, independent of the agent's actions, and how the agent's actions affect the environment. ::: tcolorbox - **Internal State:** Agent's memory of past perceptions, used to represent aspects of the current state that are not directly perceivable. - **World Model:** Knowledge about how the environment works, including: - **Transition Model:** How the environment evolves and changes over time, including the effects of the agent's actions. - **Sensor Model:** How the agent's sensors perceive the environment, including their limitations and potential inaccuracies. ::: ### Leveraging Transition and Sensor Models The **transition model** allows the agent to predict future states of the environment. Given the current state and an action, the transition model forecasts the resulting state. This predictive capability is crucial for planning and decision-making, especially in sequential environments. The **sensor model**, on the other hand, provides information about the reliability and accuracy of the agent's sensors. It helps the agent understand what aspects of the world are being perceived and how accurately. ### State Update Mechanism A critical aspect of model-based agents is their ability to update their internal state. This is typically done in each step by: 1. **Perceiving the environment** through sensors. 2. **Updating the internal state** based on the new percept, the previous state, the transition model, and the sensor model. This update process often involves inference to estimate the current state, especially when the environment is only partially observable. 3. **Selecting an action** based on the updated internal state and condition-action rules, similar to simple reflex agents, but now operating on a more informed state representation. 4. **Executing the action** in the environment. This cycle allows the agent to maintain a dynamic representation of the world, even when direct sensory information is incomplete. By incorporating a model of how the world changes and how its sensors work, the agent can make more informed decisions than a simple reflex agent. ### Algorithm for a Model-Based Reflex Agent The algorithm for a model-based reflex agent expands upon the simple reflex agent by including state maintenance and model utilization: :::: tcolorbox ::: algorithm **Algorithm 2** (). **Input:** Percept $p$\ **Persistent:** - State: current internal state description - Rules: a set of condition-action rules - Transition Model: model of how actions affect the world - Sensor Model: model of sensor capabilities - Last Action: the last action executed (optional, for state update) $state \leftarrow UpdateState(state, lastAction, p, transitionModel, sensorModel)$ $rule \leftarrow RuleMatch(state, rules)$ $action \leftarrow RuleAction[rule]$ $lastAction \leftarrow action$ **return** $action$ ::: :::: **Complexity Analysis:** In addition to the complexities of rule matching similar to simple reflex agents, model-based agents introduce complexity in state updating. The `UpdateState` function's complexity depends on the nature of the state representation and the models. In the worst case, updating the state might involve complex inference or probabilistic calculations, potentially increasing the time complexity significantly. Thespace complexity increases to accommodate the state, transition model, and sensor model, which can be substantial depending on the environment's complexity and the chosen representation. ## Goal-Based Agents ### Explicit Goal Representation and Achievement **Goal-based agents** represent a further step up in agent sophistication. While model-based reflex agents can reason about the state of the world, **goal-based agents** additionally consider explicit **goals** that they are trying to achieve. A goal is a desirable state or situation that the agent aims to reach. The agent's decision-making process is then guided by these goals. Instead of just reacting to the current state, goal-based agents consider what actions will lead them closer to their goals. ::: tcolorbox **Goal-based agents** are characterized by their explicit representation of goals. These goals provide a direction for the agent's actions. The agent's primary objective is to select actions that will lead to the achievement of these goals. This involves: - **Goal Formulation:** Defining what states or conditions constitute a goal. - **Goal Achievement:** Planning and executing sequences of actions to reach a goal state. ::: ### Planning Action Sequences for Goal Attainment Goal-based agents are particularly useful when achieving a goal requires a sequence of actions, rather than a single action. They engage in **planning**, which involves searching through possible action sequences to find a path that leads from the current state to a goal state. This planning process often utilizes the agent's world model to predict the outcomes of different action sequences and evaluate their effectiveness in reaching the goal. For example, consider an agent tasked with navigating from one location to another in a city. A goal-based agent would not just react to immediate surroundings but would plan a route, considering a sequence of turns and movements to reach the destination. This might involve using search algorithms to explore different paths and choose the most efficient one. ### Limitations: Handling Conflicting Goals and Uncertainty Despite their advanced capabilities compared to reflex agents, goal-based agents have limitations. One significant limitation is in dealing with **conflicting goals**. If an agent has multiple goals that are not mutually achievable, or if achieving one goal hinders the achievement of another, goal-based agents lack a mechanism to prioritize or balance these goals. They simply aim to achieve the specified goals without inherent preferences or trade-offs. Another limitation is in handling **uncertainty**. Goal-based agents typically operate in a binary mode: a goal is either achieved or not. They do not inherently account for the degree of goal achievement or the probability of achieving a goal in uncertain environments. For instance, if an agent has a goal to buy milk at a supermarket, a goal-based agent might not consider the possibility that the supermarket might be out of milk. It would simply plan to go to the supermarket and assume the goal will be achieved upon arrival. In reality, there's always a chance of failure or varying degrees of success, which goal-based agents, in their basic form, do not address. ## Utility-Based Agents ### Utility Function for Performance Measurement **Utility-based agents** overcome the limitations of goal-based agents by introducing the concept of **utility**. Instead of merely distinguishing between goal states and non-goal states, utility-based agents use a **utility function** to assign a numerical value, or utility, to each state. This utility value represents the degree of \"happiness\" or \"desirability\" associated with being in that state, providing a more nuanced measure of performance than simple goal achievement. ::: tcolorbox **Utility function** is a cornerstone of utility-based agents. It provides a quantitative measure of preference over states. For each state, the utility function assigns a real number that reflects the agent's preference for that state. Higher utility values indicate more desirable states. This allows agents to: - **Compare States:** Determine which state is better or more desirable. - **Make Trade-offs:** Decide between actions that might lead to different outcomes with varying degrees of desirability. - **Handle Conflicting Objectives:** Balance multiple objectives by considering their relative utilities. ::: ### Expected Utility for Decision-Making under Uncertainty Utility-based agents are particularly adept at handling **uncertainty** through the concept of **expected utility**. When faced with actions that have probabilistic outcomes, utility-based agents calculate the **expected utility** of each action. This is done by considering all possible outcome states, their associated utilities, and the probability of reaching each outcome. The expected utility of an action is the average of the utilities of its possible outcomes, weighted by their probabilities. ::: tcolorbox **Expected utility** of an action is calculated as follows: Mathematically, if an action $a$ in state $s$ can lead to states $s_1, s_2, ..., s_n$ with probabilities $P(s_1|s, a), P(s_2|s, a), ..., P(s_n|s, a)$ respectively, and the utility of each state is $U(s_1), U(s_2), ..., U(s_n)$, then the expected utility of action $a$ in state $s$, $EU(a|s)$, is given by: $$EU(a|s) = \sum_{i=1}^{n} P(s_i|s, a) \times U(s_i)$$ The agent then chooses the action that maximizes its expected utility. This approach allows utility-based agents to make rational decisions in uncertain environments, balancing potential rewards against risks. ::: ### Advantages over Goal-Based Agents Utility-based agents offer several advantages over goal-based agents: - **Preference Handling:** They can express preferences among different goal states. For example, if there are multiple ways to achieve a goal, a utility function can specify which way is preferred (e.g., a shorter path is preferred over a longer path). - **Trade-offs Between Goals:** They can handle trade-offs between multiple, potentially conflicting goals. By assigning utilities to different outcomes, the agent can choose actions that best balance competing objectives. For example, an agent might need to balance the goal of buying milk with the goal of avoiding traffic, and utility function can help in making this trade-off. - **Uncertainty Management:** They can make rational decisions in uncertain environments by maximizing expected utility, allowing them to choose actions that are likely to lead to better outcomes on average, even if the outcomes are not guaranteed. Utility-based agents represent a significant advancement in agent design, enabling more flexible, robust, and rational behavior in complex and uncertain environments. They are particularly relevant in scenarios where outcomes are not binary (success or failure) but exist on a spectrum of desirability. ## Learning Agents ### The Capacity to Learn and Improve The agent types discussed so far---reflex, model-based, goal-based, and utility-based---are primarily concerned with how agents make decisions given their current knowledge and capabilities. **Learning agents**, however, go a step further by incorporating the ability to **learn** and **improve their performance over time**. Learning agents can adapt to new environments, refine their strategies, and become more effective at achieving their goals through experience. ::: tcolorbox **Learning agents** are distinguished by their ability to learn from experience. This learning capability allows them to: - **Adapt to New Environments:** Adjust their behavior and strategies in response to changes in the environment. - **Improve Performance:** Enhance their effectiveness and efficiency in achieving goals over time. - **Acquire New Knowledge:** Discover new rules, models, or utility functions through interaction with the environment. ::: ### Architectural Components of a Learning Agent A learning agent can be conceptually decomposed into four essential components, as illustrated in [1](#fig:learning_agent_architecture){reference-type="ref+Label" reference="fig:learning_agent_architecture"}: <figure id="fig:learning_agent_architecture"> <figcaption>Architecture of a Learning Agent</figcaption> </figure> #### Performance Element The **performance element** is responsible for selecting and executing actions. It is essentially the problem-solving component of the agent and can be implemented using any of the agent types we have discussed previously (simple reflex, model-based, goal-based, or utility-based). The performance element focuses on doing the task as effectively as possible, given its current knowledge. #### Learning Element The **learning element** is the core of the learning agent. Its role is to improve the performance element by learning from experience. It takes feedback from the critic and suggestions from the problem generator to modify the performance element. The learning element is responsible for updating the agent's knowledge, rules, models, or utility functions. #### Critic and Feedback The **critic** evaluates the performance of the agent and provides feedback to the learning element. This feedback is crucial for learning. The critic compares the agent's behavior against a performance standard and provides signals indicating how well the agent is doing. This feedback can be in various forms, such as rewards for good actions, penalties for bad actions, or evaluations of the agent's state. As mentioned in the transcript, the critic often involves external input, either from a human expert or a dataset that provides labeled examples of good and bad actions or outcomes. #### Problem Generator and Exploration The **problem generator** is responsible for exploration. It suggests actions that might lead to new and informative experiences. The problem generator encourages the agent to step outside its current strategy and try new things, even if these actions might not be optimal in the short term. By exploring, the agent can discover new aspects of the environment, identify better strategies, and improve its long-term performance. The problem generator helps balance exploration with exploitation, ensuring that the agent not only performs well based on current knowledge but also continuously seeks to expand and refine that knowledge. ### The Role of Machine Learning **Machine learning** is the set of techniques and algorithms that empower learning agents. It provides the tools for the learning element to process feedback, update internal models, and improve decision-making strategies. As noted in the transcript, modern AI is heavily reliant on machine learning, particularly deep learning, for creating intelligent systems. Alan Turing, as early as 1950, envisioned that instead of programming intelligent agents directly, it might be more effective to create agents that can learn for themselves. This vision is now a reality, with machine learning driving advancements in virtually all areas of AI. Learning agents are capable of operating in environments that are too complex or too poorly understood to be programmed directly, making them essential for tackling real-world AI challenges. In the subsequent lectures, while we will primarily focus on search algorithms within simpler agent architectures, it is crucial to remember that learning is the ultimate goal for creating truly intelligent and adaptable systems. The principles of search we will discuss lay the groundwork for more advanced learning techniques that build upon these foundations. # Defining Search Problems We now transition to a more formal and structured approach to problem-solving in AI by defining **search problems**. Search is a fundamental technique in artificial intelligence, providing a systematic way to find solutions to a wide array of problems. To effectively utilize search algorithms, we must first clearly define the problem in a way that is amenable to computational methods. This involves identifying the key components that constitute a search problem and understanding how they interact. ## Formalizing Search Problems To apply search algorithms, we need a precise, formal definition of a **search problem**. This formalization allows us to use computational methods to explore potential solutions systematically. A well-defined search problem breaks down the challenge into manageable components, making it possible to design and implement algorithms that can find solutions efficiently. ## Elements of a Search Problem (6-tuple) A search problem is conventionally defined as a 6-tuple. This tuple encapsulates all the necessary information to describe the problem and its solution space. The six essential elements are: 1. **State Space** 2. **Initial State** 3. **Actions** 4. **Transition Model** 5. **Goal Test** 6. **Cost Function** Let's delve into each of these components to understand their role in defining a search problem. ### State Space ($\mathcal{S}$) {#subsec:state_space} ::: {.def:state_space .definition} **Definition 1** (State Space). The **state space**, denoted as $\mathcal{S}$, is the set of all possible states in which the agent and its environment can exist. A state is a complete description of the configuration of the problem at a particular point in time. ::: ::: remark **Remark 1** (Granularity of State Space). The granularity of the state space is critical; it should be detailed enough to solve the problem but abstract enough to keep the search manageable. ::: ::: example **Example 8** (State Space in Pac-Man). In a simplified Pac-Man game on a 3x3 grid with pellets, a state is not just Pac-Man's location. It includes: - Pac-Man's position (e.g., grid coordinates). - The location of all remaining pellets on the grid (present or absent). - Optionally, Pac-Man's direction if it's relevant to the problem definition. Thus, a state in this Pac-Man example is a combination of Pac-Man's location and the configuration of pellets. Each unique arrangement of these elements constitutes a distinct state within the state space. It's important to note, as highlighted in the transcript, that a state is not merely a location but a comprehensive snapshot of the problem's configuration. ::: ### Initial State ($s_0$ or $S_{initial}$) {#subsec:initial_state} ::: {.def:initial_state .definition} **Definition 2** (Initial State). The **initial state**, denoted as $s_0$ or $S_{initial}$, is the state in which the agent starts its problem-solving journey. It is the starting point from which the search process begins. There is only one initial state in any search problem. ::: ::: example **Example 9** (Initial State in Romania Travel). In the Romania travel problem, if we start our journey in the city of Arad, then *Arad* is the initial state. All search paths will originate from this state as we attempt to reach our destination, Bucharest. ::: ### Actions ($\mathcal{A}(s)$) {#subsec:actions} ::: {.def:actions .definition} **Definition 3** (Actions). **Actions** represent the possible moves or operations that an agent can perform when it is in a state $s$. For each state $s$ in the state space $\mathcal{S}$, there is a set of applicable actions, denoted as $\mathcal{A}(s)$. These actions define how the agent can transition from one state to another. ::: ::: example **Example 10** (Actions in Romania Travel). In the Romania travel problem, if we are in the city of Sibiu, the possible actions are to drive to any adjacent city. Based on the Romania map, the set of actions from Sibiu, $\mathcal{A}(\text{Sibiu})$, would be: - Drive to Arad - Drive to Fagaras - Drive to Oradea - Drive to Rimnicu Vilcea Each of these actions represents a possible move from the current state (Sibiu) to a neighboring state (city). ::: ### Transition Model ($\text{Result}(s, a)$) {#subsec:transition_model} ::: {.def:transition_model .definition} **Definition 4** (Transition Model). The **transition model**, denoted as $\text{Result}(s, a)$, describes the outcome of performing an action $a$ in a state $s$. It defines the next state that the agent will transition to after executing action $a$ in state $s$. ::: ::: remark **Remark 2** (Deterministic vs. Non-deterministic Environments). In a **deterministic environment**, the transition model is a function that uniquely determines the next state. For a given state and action, there is only one resulting state. In **non-deterministic environments**, the transition model might specify a probability distribution over possible next states. However, for the initial discussion of search problems, we often assume deterministic environments for simplicity. ::: ::: example **Example 11** (Transition Model in Romania Travel). Consider the action \"Drive to Zerind\" from the initial state \"Arad\". In a deterministic model, the transition model would specify: $$\text{Result}(\text{Arad}, \text{DriveToZerind}) = \text{Zerind}$$ This indicates that performing the action \"Drive to Zerind\" in the state \"Arad\" deterministically leads to the state \"Zerind\". ::: ### Goal Test ($\text{GoalTest}(s)$) {#subsec:goal_test} ::: {.def:goal_test .definition} **Definition 5** (Goal Test). The **goal test**, denoted as $\text{GoalTest}(s)$, is a function that determines whether a given state $s$ is a goal state. It defines the condition for success in the search problem. The goal test function takes a state as input and returns true if the state is a goal state, and false otherwise. ::: ::: remark **Remark 3** (Multiple Goal States). There can be one or more goal states in a search problem. Instead of specifying a single goal state, using a goal test function allows for defining a set of goal states based on properties or conditions. ::: ::: example **Example 12** (Goal Test in Romania Travel). In the Romania travel problem, the goal is to reach Bucharest. The goal test function would be: $$\text{GoalTest}(s) = \begin{cases} \text{true} & \text{if } s = \text{Bucharest} \\ \text{false} & \text{otherwise} \end{cases}$$ For any state $s$, $\text{GoalTest}(s)$ checks if $s$ is the city of Bucharest. If it is, the function returns true, indicating that we have reached a goal state. ::: ### Cost Function ($\text{Cost}(s, a, s')$ or Path Cost) {#subsec:cost_function} ::: {.def:cost_function .definition} **Definition 6** (Cost Function). The **cost function**, denoted as $\text{Cost}(s, a, s')$, assigns a numerical cost to the path from state $s$ to state $s'$ via action $a$. It measures the \"expense\" of taking action $a$ in state $s$ to reach the successor state $s'$. ::: ::: remark **Remark 4** (Optimal Solution). The cost function can represent various measures, such as time, distance, resources used, or penalties incurred. In many problems, the cost is associated with each action, and the total cost of a path is the sum of the costs of the actions in the path. We aim to find a solution with the minimum total cost, which is known as an optimal solution. ::: ::: example **Example 13** (Cost Function in Romania Travel). In the Romania travel problem, the cost function can be the distance between cities. If driving from Arad to Zerind has a distance of 75 km, then: $$\text{Cost}(\text{Arad}, \text{DriveToZerind}, \text{Zerind}) = 75$$ The cost function quantifies the expense of moving from one city to another, which we want to minimize to find the shortest route to Bucharest. ::: ## Illustrative Examples of Search Problems To solidify our understanding of these elements, let's revisit and elaborate on the examples mentioned earlier, explicitly defining each component of the 6-tuple for each problem. ### Example 1: Romania Travel to Bucharest - **State Space**: $\mathcal{S} = \{\text{Arad, Bucharest, ..., Zerind}\}$ - The set of cities in Romania as depicted on the map. Each city is a state. - **Initial State**: $s_0 = \text{Arad}$ - The starting city of the journey. - **Actions**: For each city $c$, $\mathcal{A}(c) = \{\text{Drive to } c' | c' \text{ is adjacent to } c \text{ on the map}\}$ - Driving from the current city to any adjacent city. - **Transition Model**: $\text{Result}(c, \text{Drive to } c') = c'$ - Driving from city $c$ to city $c'$ results in being in city $c'$ (deterministic). - **Goal Test**: $\text{GoalTest}(c) = (\text{Is } c = \text{Bucharest?})$ - Checks if the current city is Bucharest. - **Cost Function**: $\text{Cost}(c, \text{Drive to } c', c') = \text{Distance between } c \text{ and } c'$ - The cost of driving from city $c$ to $c'$ is the given road distance between them. A **solution** to this problem is a sequence of cities starting from Arad and ending in Bucharest. An **optimal solution** is the path (sequence of cities) with the minimum total distance, which is the sum of the costs of each action (driving between cities) along the path. ### Example 2: Vacuum Cleaner Problem Revisited - **State Space**: $\mathcal{S} = \{(L, D_A, D_B) | L \in \{A, B\}, D_A \in \{\text{Clean, Dirty}\}, D_B \in \{\text{Clean, Dirty}\}\}$ - States are defined by the vacuum cleaner's location (A or B) and the dirt status of both locations (Clean or Dirty). There are $2 \times 2 \times 2 = 8$ possible states. - **Initial State**: e.g., $s_0 = (A, \text{Dirty}, \text{Dirty})$ - Vacuum cleaner starts at location A, both locations are initially dirty. - **Actions**: $\mathcal{A}(s) = \{\text{MoveLeft, MoveRight, Suck}\}$ - The agent can move left, move right, or suck dirt. Actions are applicable in every state. - **Transition Model**: For example: - $\text{Result}((A, \text{Dirty}, D_B), \text{Suck}) = (A, \text{Clean}, D_B)$ - Sucking at location A, if dirty, makes location A clean. - $\text{Result}((A, D_A, D_B), \text{MoveRight}) = (B, D_A, D_B)$ - Moving right from A takes the agent to location B, dirt statuses remain unchanged. - **Goal Test**: $\text{GoalTest}((L, \text{Clean}, \text{Clean})) = \text{true}$ for any location $L \in \{A, B\}$ - The goal is achieved when both locations A and B are clean, regardless of the vacuum cleaner's location. - **Cost Function**: $\text{Cost}(s, a, s') = 1$ for all states $s, s'$ and actions $a$ - Each action (move or suck) costs 1 unit. ### Example 3: 8-Puzzle Problem - **State Space**: $\mathcal{S} = \{\text{All possible configurations of 8 tiles and a blank space in a 3x3 grid}\}$ - Each configuration of the 8-puzzle is a state. The number of possible states is 9!/2 = 181,440. - **Initial State**: A given scrambled configuration of the 8-puzzle tiles. - **Actions**: $\mathcal{A}(s) = \{\text{Move blank tile Up, Down, Left, Right}\}$ - Move the blank tile up, down, left, or right, if the move is within the grid boundaries. - **Transition Model**: $\text{Result}(s, \text{Move blank tile Up})$ = $s'$ - Moving the blank tile up in state $s$ results in a new state $s'$ with tiles rearranged. - **Goal Test**: $\text{GoalTest}(s) = (\text{Is } s = \text{goal configuration?})$ - Checks if the current configuration matches the predefined goal configuration (tiles in order, blank in a specific position). - **Cost Function**: $\text{Cost}(s, a, s') = 1$ for all valid moves - Each move of a tile costs 1 unit. ### Example 4: Tic-Tac-Toe Game - **State Space**: $\mathcal{S} = \{\text{All possible Tic-Tac-Toe board configurations}\}$ - Includes boards with X's, O's, and empty squares. The game states evolve as players make moves. - **Initial State**: $s_0 = \text{Empty 3x3 board}$ - The game starts with an empty board. - **Actions**: $\mathcal{A}(s) = \{\text{Place X or O in an empty square}\}$ - For each state, the available actions are to place a player's mark (X or O, depending on whose turn it is) in any empty square. - **Transition Model**: $\text{Result}(s, \text{Place } X \text{ in square } (i, j)) = s'$ - Placing an X in square (i, j) in state $s$ results in a new board state $s'$ with the X in that position. - **Goal Test**: $\text{GoalTest}(s) = (\text{Is } s \text{ a terminal state?})$ - Checks if the game has ended. Terminal states are when one player has won (three in a row), or it's a draw (all squares filled, no winner). - **Cost Function**: In game playing, the cost is often related to the outcome (win, lose, draw). In a search formulation, we might consider the path length to a winning state or minimize the number of moves to win, effectively setting $\text{Cost}(s, a, s') = 1$ for each move, and aiming to find a path to a winning state with minimal moves. ## The Challenge of State Space Explosion ### Combinatorial Complexity and Intractability ::: {.rem:state_space_explosion .remark} **Remark 5** (State Space Explosion Problem). A major hurdle in solving search problems, especially for complex real-world scenarios, is the **state space explosion problem**. This issue arises because the number of possible states can grow exponentially with the problem's dimensions or complexity. As the state space becomes larger, the computational resources (time and memory) required to explore and find a solution increase dramatically, often rendering exhaustive search methods infeasible. This combinatorial complexity makes it computationally intractable to explore the entire state space for many problems of practical interest. ::: ### Illustrative Examples of State Space Size The magnitude of the state space explosion is evident in several examples: - **8-Puzzle**: As mentioned, the 8-puzzle has approximately 181,440 reachable states. While this number is considerable, it is still manageable for many search algorithms on modern computers. - **15-Puzzle**: Increasing the puzzle size to a 15-puzzle (4x4 grid with 15 tiles and a blank) dramatically increases the state space to about $10^{13}$ states (over ten trillion). Searching through such a vast space exhaustively becomes significantly more challenging. - **Chess**: The game of Chess exemplifies extreme state space explosion. The estimated number of states in a typical chess game is around $10^{44}$. This astronomical number makes it impossible to enumerate or explore the entire state space. Even with advanced search algorithms and heuristics, chess programs only explore a tiny fraction of the state space. - **Pac-Man (Complex Version)**: In a more complex Pac-Man game, considering Pac-Man's position, direction, the locations and states of all pellets, and the positions and behaviors of ghosts, the state space can quickly become unmanageably large. As the transcript example illustrated, even a simplified Pac-Man environment can lead to a state space that is computationally challenging to explore fully. For instance, with 120 possible positions for Pac-Man, $2^{30}$ pellet configurations, and considering ghost positions and Pac-Man's direction, the total number of states becomes immense and impractical to handle without efficient search strategies and approximations. ::: remark **Remark 6** (Necessity of Heuristics). This combinatorial explosion necessitates the development and application of efficient search algorithms and heuristics. Heuristics are problem-specific rules of thumb that guide the search process, helping to prioritize exploration in promising directions and prune away less promising paths. Without such techniques, solving complex search problems within reasonable time and resource constraints would be impossible. ::: In the following lectures, we will explore various search algorithms and heuristic methods designed to navigate large state spaces effectively and find optimal or near-optimal solutions. # State Space Graphs ## Graphical Representation of Search Problems Search problems can be visually and conceptually represented using **state space graphs**. A state space graph is a graph where: ## Nodes as States and Edges as Actions - **Nodes** in the graph correspond to **states** in the state space. Each state is represented by a unique node. - **Edges** in the graph represent **actions** that transition the agent from one state to another. Edges are typically directed, indicating the direction of the action's effect. ## Action Costs on Edges ::: remark **Remark 7** (Action Costs on Edges in State Space Graphs). Edges can be labeled with **action costs**, representing the cost of performing the corresponding action to move between states. This cost is derived from the cost function defined in the search problem. ::: ## Solutions as Paths in the Graph ::: {.def:solution_state_space_graph .definition} **Definition 7** (Solution in State Space Graph). A **solution** to a search problem in a state space graph is a **path** from the **initial state node** to a **goal state node**. The path is a sequence of nodes connected by edges, representing a sequence of actions. ::: ::: {.def:optimal_solution_state_space_graph .definition} **Definition 8** (Optimal Solution in State Space Graph). An **optimal solution** is the path with the minimum total cost, where the total cost is the sum of the costs of all edges along the path. ::: ## Example: Romania Map as State Space Graph ::: example **Example 14** (Romania Map as State Space Graph). In the Romania travel problem, the map of Romania can be directly seen as a state space graph: - **Cities** are **nodes**. - **Roads between cities** are **edges**. - **Distances between cities** are **edge costs**. The initial node is Arad, and the goal node is Bucharest. Finding a route from Arad to Bucharest is equivalent to finding a path in this graph, and finding the shortest route is finding the minimum cost path. ::: # Conclusion In this lecture, we have progressed from reviewing the basics of intelligent agents and task environments to formally defining search problems. We explored the dimensions of task environments and their impact on agent design, and categorized different types of agents based on their complexity and capabilities, from simple reflex agents to learning agents. We then introduced the concept of a search problem, defined by a 6-tuple including state space, initial state, actions, transition model, goal test, and cost function, and illustrated these with examples like Romania travel, vacuum cleaner, 8-puzzle, and Tic-Tac-Toe. We highlighted the state space explosion problem, emphasizing the combinatorial complexity inherent in many search problems. Finally, we introduced state space graphs as a graphical representation of search problems, where states are nodes and actions are edges, setting the stage for exploring search algorithms in the upcoming lectures. ::: tcolorbox **Key Takeaways:** - **Task Environment Properties Matter:** The characteristics of the task environment (observability, determinism, etc.) fundamentally determine the design and complexity of intelligent agents. - **Agent Types Reflect Increasing Sophistication:** Agent architectures range from simple reflex-based systems to complex utility-based and learning agents, each suited to different environmental complexities and task requirements. - **Formal Search Problems Enable Systematic Solutions:** Defining a problem as a 6-tuple (state space, initial state, actions, transition model, goal test, cost function) provides a structured framework for applying search algorithms. - **State Space Graphs Visualize Search Landscapes:** Representing search problems as graphs, with states as nodes and actions as edges, offers an intuitive and analytical approach to problem-solving. - **State Space Explosion is a Core Challenge:** The exponential growth of state spaces in complex problems necessitates efficient search algorithms and heuristic methods to find solutions practically. ::: ::: tcolorbox **Looking Ahead:** Building upon the foundations laid in this lecture, our next session will delve into the algorithms designed to solve search problems. We will explore fundamental search techniques that navigate state space graphs to find solutions efficiently. **Follow-up Questions and Topics for the Next Lecture:** - **Efficient Search Strategies:** How can we design algorithms to efficiently find solutions in vast state spaces, mitigating the challenges of state space explosion? - **Fundamental Search Algorithms:** What are the basic search algorithms used in AI, such as Breadth-First Search (BFS), Depth-First Search (DFS), and Uniform Cost Search (UCS), and how do they operate? - **Heuristics for Informed Search:** How can we incorporate heuristics---problem-specific knowledge---to guide search algorithms and significantly improve their efficiency and effectiveness? - **Search in Non-Deterministic Environments:** How can we adapt search techniques to handle non-deterministic environments where actions have uncertain outcomes, moving beyond the deterministic models we've initially considered? :::