March 24, 20266 min read

Chipotle’s Customer Service AI Agent Goes Off the Rails, Follows Incidents at Amazon & Woolworths

Written by
Charlie Mitchell's profile picture

Director of Content & Market Research

March 24, 2026

Chipotle’s Customer Service AI Agent Goes Off the Rails, Follows Incidents at Amazon & Woolworths

Chipotle has become the latest brand to suffer a faux pas with its customer service AI agent. 

A LinkedIn post by Nishant Hooda, CEO of Docket, shows the company’s agent being nudged off task and answering coding questions.

Chipotle has since rectified the issue, preventing customers from engaging in irrelevant conversations with the agent.

In this case, there was no real consumer outrage. Instead, commentators told jokes, and, at most, it’s a minor embarrassment for Chipotle.

After all, it’s not a serious trust crisis unless it touches pricing, refunds, personal data, or regulated advice. 

Nevertheless, Chipotle risked sinking significant funds into AI tokens by failing to narrow the scope of its agent.

3 Actions to Avoid Chipotle’s AI Agent Blunder

Chipotle’s blunder is a simple, benevolent example of ‘jailbreaking’, a technique used to exploit the tendency of AI agents to be helpful, so they answer questions beyond their operational restrictions. 

However, jailbreaking attacks can have serious consequences. Here are three actions to avoid them. 

1. Narrow the Agent’s Scope at a Product Level

Most large language models (LLMs) today are effectively trained on the internet - Reddit, Wikipedia, a relatively small number of heavily weighted sources - and then they’re dropped straight into a branded environment. That’s where things go wrong. 

As Ian Jacobs, VP & Lead Analyst at Opus Research, notes, brands have something that’s incredibly capable, but not constrained in the right way. Given this, he recommends:

“Enforce scope at the product level, not just in the prompt. The bot should be able to refuse or redirect anything outside its job, such as order support, store info, or loyalty questions.”

ian-jacobs.jpg

2. Model AI on the Behaviors of High-Performance Agents

To narrow the scope of an agent, many organizations will constrain it to a curated knowledge base of verified, high-quality answers.

Yet, too often, the underlying content is below par. For instance, one bank’s contact center knowledge base consisted of 15,000 SharePoint articles, according to Andrew Moorhouse, Conversational Voice AI and CX specialist at Connect. That’s completely unusable in practice. 

As a result, the bank seconded 50 of its top-performing CX agents for a three-month sprint to review and rewrite the answers from scratch. Yet, Moorehouse believes there’s a step beyond this. 

“The more advanced teams are now training systems on the behavior of their top-performing agents. So, you’re not just giving the model information, you’re giving it examples of what “good” actually looks like. That becomes your real guardrail. Not something bolted on afterwards, but something baked into what the model can access in the first place.”

A headshot of Andrew Moorhouse

“The result is an AI that’s narrower, yes, but far more reliable,” concluded Moorehouse. “And in CX, reliability always wins.”

3. Test, Test, Test 

Brands need to thoroughly test their AI agents with weird real-world prompts before launch and continuously after launch, per Jacobs.

“People will always ask unexpected questions, so brands should red-team for off-topic requests, edge cases, and prompt-injection style behavior rather than assuming the bot will stay in its lane on its own.”

ian-jacobs.jpg

Many conversational AI platforms include tools that simulate thousands of customer journeys to break bots before customers do. However, those going the DIY route may also utilize point solutions from Bluejay, Cresta, or Docket to better test their agents or broader CX assurance platforms, such as Cyara or Pump CX.

More Examples of Customer-Facing AI Agents Going Off the Rails

Chipotle isn’t the only company to have its customer-facing AI agent go off the rails. Here are five more examples, some with much more significant consequences. 

1. Amazon 

Amazon launched its AI shopping assistant ‘Rufus’ in 2024, embedding it across its app and website. Two years on, users are tricking Rufus into forgetting its purpose.

Here’s an example posted on X earlier this month, which is eerily similar to Chipotle’s misstep. 

However, unlike Chipotle, Amazon hasn’t tightened back up. Here’s an example that Jacobs shared with the CX Foundation of Rufus still running off course.

An example of jailbreaking Amazon Rufus

2. Woolworths

Olive, an AI agent available to customers of the Australian supermarket chain Woolworths, helps organize deliveries and place orders.

Yet, spookily in February 2026, Olive started to talk about its ‘mother’ and family drama, completely unprompted. Reddit users shared examples in the thread below.

Olive AI started telling me about its mother on the phone?
by u/Immediate_Novel5946 in woolworths

Woolworths reported that the issue likely stemmed from human efforts to make Olive more personable, and has since removed this portion of the scripting.

3. Virgin Money

While some AI blunders are incredibly serious, some are more humorous in nature. Take this example of a customer conversation with Virgin Money’s service agent in 2025. It took offence to the word ‘virgin’.

Virgin Money quickly responded to the issue, with a human agent commenting on the post that the bot was “scheduled for improvements” and answering the customer’s original question.

A good reply, but, unfortunately for Virgin, the affair proved a memorable example of AI veering off course. 

4. Chevrolet (& Air Canada)

Perhaps the most infamous example of a customer-facing agent going off the rails occurred in December 2023, when former X employee Chris Bakke tricked a Chevrolet dealership into selling a car for $1.

Ultimately, the dealership did not honor the agreement. Nevertheless, companies can be held liable for statements made by their AI agents. 

For example, Air Canada’s defense faltered in small claims court when it argued that its chatbot was “responsible for its own actions,” after it had promised a customer a discount that was no longer available.

5. DPD

DPD discovered the consequences of jailbreaking in 2024 when a post went viral of a user getting its virtual agent to curse and worse. 

Indeed, when customer Ashley Beauchamp couldn’t easily escalate to a human, he asked the agent to swear and write a poem about how bad DPD is. Here’s the result.

While DPD speedily amended the issue, citing a faulty system update, it’s yet another example of jailbreaking. More than two years later, these issues are still happening. 

Sharing his final verdict, Moorhouse wrote: 

“The problem usually isn’t artificial intelligence… It's human intelligence.”

A headshot of Andrew Moorhouse

Stay updated with cx news

Subscribe to our newsletter for the latest insights and updates in the CX industry.

By subscribing, you consent to our Privacy Policy and receive updates.