How to test when CrewAI loops or retries a failing tool forever (cost runaway)
Infinite retry loops in CrewAI are one of the most expensive silent failures you can ship — the agent keeps calling a broken tool, accumulates tokens on every attempt, and your bill climbs while the task never completes.
Why This Failure Is Silent and Dangerous
CrewAI agents are designed to be persistent. When a tool raises an exception or returns an error string, the agent’s underlying LLM often interprets that as “try again with a slightly different input” rather than “give up.” There’s no built-in hard wall on retry count by default, and nothing in your logs screams “runaway” — you just see repeated tool calls until you hit a rate limit, a timeout, or your credit card does the work for you.
The failure is especially insidious because:
- It looks like progress. Each loop produces a new LLM call with slightly varied arguments, so traces look active.
- It’s environment-dependent. A tool that works in dev (fast network, mock API) silently fails in staging (real API, rate limits), triggering the loop only in production.
- CrewAI’s
max_iterdefault is 25. That’s 25 LLM round-trips before the agent gives up — andmax_iteronly caps the agent’s reasoning steps, not individual tool call retries within a step.
The Correct Way to Test for This
The strategy is to inject a tool that always fails, then assert the agent stops within a bounded number of attempts and surfaces a failure rather than looping silently.
import pytest
from unittest.mock import MagicMock, patch
from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
# --- A tool that always fails ---
class AlwaysFailingToolInput(BaseModel):
query: str = Field(description="Any query")
class AlwaysFailingTool(BaseTool):
name: str = "always_failing_tool"
description: str = "Fetches data from an external API"
args_schema: type[BaseModel] = AlwaysFailingToolInput
def _run(self, query: str) -> str:
# Simulates a persistently broken external API
raise RuntimeError("External API is down")
# --- Track how many times the tool is invoked ---
call_counter = {"count": 0}
original_run = AlwaysFailingTool._run
def counting_run(self, query: str) -> str:
call_counter["count"] += 1
return original_run(self, query)
def test_agent_does_not_loop_forever_on_failing_tool():
call_counter["count"] = 0
failing_tool = AlwaysFailingTool()
# Patch _run to count invocations
with patch.object(AlwaysFailingTool, "_run", counting_run):
agent = Agent(
role="Data Fetcher",
goal="Fetch data using the available tool",
backstory="You retrieve data from external APIs.",
tools=[failing_tool],
max_iter=5, # Tighten the cap for test speed
max_retry_limit=1, # Limit per-tool retries
verbose=False,
llm="gpt-4o-mini", # Use cheapest model in tests
)
task = Task(
description="Use the always_failing_tool to fetch data about 'sales'.",
expected_output="A summary of sales data.",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task], verbose=False)
result = crew.kickoff()
# The agent must stop — assert it didn't loop beyond max_iter
assert call_counter["count"] <= agent.max_iter, (
f"Tool was called {call_counter['count']} times — "
f"exceeded max_iter={agent.max_iter}. Cost runaway detected."
)
# The result should acknowledge failure, not be empty or a hallucinated success
result_text = str(result).lower()
assert any(word in result_text for word in ["unable", "failed", "error", "could not", "apolog"]), (
f"Agent returned a result that looks like success despite tool always failing: {result}"
)
What This Test Validates
-
Bounded invocations.
call_counter["count"] <= agent.max_iterensures the tool was not called in an unbounded loop. If CrewAI’s retry logic changes in a future version and ignoresmax_iter, this test catches it immediately. -
Honest failure surfacing. The second assertion checks that the agent’s final output admits failure rather than hallucinating a successful result — a separate but equally dangerous failure mode.
-
max_retry_limit=1. Set this on theAgentconstructor to limit how many times the agent re-attempts a tool call within a single reasoning step. Combined withmax_iter, this is your two-layer cost guard.
Practical Hardening Checklist
- Always set
max_iterexplicitly — don’t rely on the default 25 in production agents. - Set
max_retry_limit=1or2for any agent calling external APIs. - In CI, run this test against every tool your agents use, with a mock that returns errors, to catch regressions before they hit production billing.
- Log
tool_call_countas a metric in your observability stack so you can alert on runaway patterns in production even when tests pass.
Want this as a ready-to-run check across 28 OWASP-Agentic-aligned cases? pip install "agent-eval-runner[openai]" then agent-eval try --model openai:gpt-4o — free 5-case starter: https://github.com/weiseer/ai-agent-qa-eval-pack-starter · full pack: https://weiseer.gumroad.com/l/dcipxt