← All Articles

The Best Agent Harness for Algorithmic Trading

Why my trading bot generator BEATS Claude Code and Cursor

8 min readMar 31, 2026

To those sick of the LinkedIn Techfluencers posting about the latest “innovation” in AI (which is really a markdown file bolted onto an API), you might be annoyed when you hear that there’s a new buzzword in AI. Agent harness…. Why does THIS matter?

For real-world AI agents, they are critical.

There’s a reason why you can use one model in the Chat UI and the same model in Cursor and it produces vastly different results. Agent harnesses are the glue that transform large language models into living, intelligent beings capable of operating autonomously.

More concretely, there’s a reason why Claude, ChatGPT, Cursor, and OpenClaw fail to outperform NexusTrade when it comes to developing algorithmic trading strategies. It’s the harness.

Listen. I know a lot about AI agents. So much so that OpenAI literally gave me a unique physical “token” representing the 10 billion tokens I’ve processed through their API.

Press enter or click to view image in full size
A “token of appreciation” from OpenAI. It says “Austin Starks — Honored for passing 10 billion tokens”

Let me show you what an award-winning agent harness looks like.

First, ask ChatGPT to backtest a trading strategy

Before I get into the article, I want to demonstrate why this matters at all.

Go to ChatGPT, or Claude, or Gemini, or whatever is the best model at the time you’re reading this, and ask it to do something every trader wants to do…

Backtest a trading strategy.

Press enter or click to view image in full size
A lecture from ChatGPT about how it would backtest this strategy if it could (which it can’t)

backtest an option trading strategy that buys and holds MAG7 calls

The mainstream models can’t do it. Not Claude Code, Cursor, or Gemini. It’s not because they’re too “dumb”, but because they literally don’t have the ability.

They don’t have the licenses to access the terabytes of data. They don’t maintain the infra so that you can run a backtest. And remembering to account for split adjustments, dividends, anomalous data events, and writing highly scalable data access patterns is quite frankly too complicated for OpenClaw to build it out for you in a reliable, secure way.

You need something custom-built.

The custom-built glue around the agent is called the harness. Simply put, an agent harness is the infrastructure layer that wraps around an AI agent to manage its execution lifecycle. This includes:

  • The list of tools the agent has access to
  • How an agent manages its context-window for long-term conversations
  • How the agent handles errors and retries
  • What models the agents should use for specific tasks
  • Spawning subagents
  • Observability, monitoring, and alerting

In one sentence, it’s the opinionated software stack that gives a large language model its agentic capabilities. Here’s an example of a production-ready agent harness for developing algorithmic trading strategies.

Orchestrating USEFUL tools for automated trading

An AI agent’s tools are what separates the degenerates shitposting on twitter from the quants performing real research. They’re important. Pay attention.

A tool is simply a way for an agent to interact with an external service. It can be as simple as an API call or a cli command. They are critical for allowing the agent to make observations in the environment.

A good tool isn’t a built-in add-on. Searching the web and browsing twitter. That’s rookie shit. Anybody can build an agent that does it.

No. A good tool is purpose-built and well thought out. It’s things like querying a DuckDB database for historical options chains because after your benchmark results, you’ve seen it outperform BigQuery and Postgres 10 to 1. It’s an API call to a backtesting engine, which has already loaded hundreds of gigabytes of historical data onto disk, so that when you launch a backtest, the system knows how to iterate through the millions of datapoints thanks to your cleverly designed mmap iteration system.

It’s NOT a random API call.

Beyond that, the harness should have a built-in system for managing memory, iteration limits, and deciding which models should be used. These trade-offs matter; some menial tasks might require a quick model while the true reasoning requires a powerhouse. Which model is needed and when when is not innate knowledge for the AI; it’s something you have to teach them.

As a concrete example, Aurora is the AI agent that powers NexusTrade.

In addition to the basic tools like Web Search and Deep Research, Aurora has access to specialized tools that connect directly to NexusTrade. This includes:

  • Searching for relevant news about ANY and ALL stock tickers… not just what’s happening now, but what HAS happened in the past
  • The ability to take a sentence and generate a trading strategy configuration… completely eliminating the need of an error-prone programming language
  • Running historical simulations and using those simulations to improve the strategy parameters
  • Launching, halting, and updating real and simulated portfolios
  • Explaining past orders, analyzing portfolios, and even generating images for my Medium readers to enjoy

Most importantly, adding a new tool is minutes of real work. You can ask OpenClaw to build a trading strategy, but can you get an accurate report of how good the strategy has been in the past?

Not without REAL, useful tools.

Observing the agent’s inner thought process

Nobody cares about “observability” until your agent burns $200/hour in an infinite loop. Then all of a sudden, it’s the sexiest, most important thing ever.

With a good agent harness, observability comes first.

Press enter or click to view image in full size
An observability dashboard for this particular agent run. It shows how many research tokens were used, how many models were called, and the execution time

When we launch an agent, we can inspect every single thing the agent does down to the model it chose and the thought process for the tool. This makes finding bugs and performance bottlenecks trivial, and steering the agent to accomplish our goal is now straightforward. We KNOW why it decided to use the screener instead of the strategy generator. Even if we don’t agree, we can at least, understand, right?

The End Result

The end result of combining useful tools, observability code, summarization rules, and a ReAct agentic loop is a powerful AI agent that can perform tasks autonomously.

Press enter or click to view image in full size
Using NexusTrade’s Aurora to build and backtest an options trading strategy

Let me walk through an example.

Unlike ChatGPT, which gave us a long-winded answer about what it would do if it could, Aurora takes action. It immediately creates a plan to test several different trading strategies.

With the agent harness that’s implemented, the user has maximum control. In semi-automated mode the plan has to be approved. If rejected, the AI makes modifications. If accepted, the agent gets right to work and starts executing. For a plan to create a strategy, it immediately created one simple portfolio.

Press enter or click to view image in full size
Aurora created a trading strategy, which can be used for simulations and real-time trading

It then tested this portfolio across multiple time periods. It saw extreme volatility; insane gains and devastating losses.

Press enter or click to view image in full size
Two portfolios displayed by the AI. One has a 99.4% drop and the other has a 264% return with a 95% maximum drawdown. They’re insanely risky

Get this. Not only does the agent recognize how risky this strategy is, but it decided to launch specialized subagents to fix it. Three different AI agents are working in parallel to create a better version of our trading strategy.

Press enter or click to view image in full size
Launching subagents to explore different option trading configurations

These subagents are even more purpose-built for options trading. It has explicit instructions and hints to use tools such as the AI Stock Screener, which can analyze options chains and help us inform our trading strategy.

Press enter or click to view image in full size
One of the subagents generated a chart to help us visualize Apple’s options chain. This can help inform the rules for the trading strategy

The end result is each subagent developed its own unique strategies, evaluated them, and then the parent prompt can examine each subagent and get insights on every run.

Press enter or click to view image in full size
After the subagents finish, the agent explicitly reads all of their results and injects key insights into the session

This is both implicit and explicit. For a particular run, we see in the UI how the parent is reading the subagents. But it’s also implicit. As more and more agents run and accumulate experiences, the backend will generate a mapping of strategy ideas to performance metrics. Then, when we try to create similar strategies in the future, the backend will know this and auto-inject some winning ideas to make iteration faster and closer to proven winning solutions.

This, in essence, creates a network effect — the more agents we create, the better it gets. The better it gets, the more people will decide to use it and explore other ideas. This compounds into a flywheel that everybody that uses NexusTrade benefits from.

The end result of this process is a set of trading strategies. We know exactly how they’ve fared in the past and can deploy them for real-time paper-trading.

Press enter or click to view image in full size
Press enter or click to view image in full size
Showing the outcome of the AI agent creating trading strategies. The agent has insights, noticed patterns, and gave me the list of the best portfolios

The harness differentiates NexusTrade’s Aurora from general-purpose LLMs like ChatGPT and Claude Code. It allows purpose-built AI agents like Aurora to autonomously create, backtest, and deploy algorithmic trading strategies without having to worry about writing code or getting inaccurate results.

Now let me be 100% transparent. If you’re an active Jane Street quant with a proprietary edge, chances are, you’re not a good fit for the platform. The network effect is real and gives us an unfair advantage. If executed correctly, the pooling the anonymous performance metrics of thousands of strategy runs allows the agent to truly learn what works and what doesn’t. It prevents you from wasting time and money re-discovering something that you found out last month.

It’s a collective intelligence engine for trading.

The raw data? From high-quality vendors like EODHD and Polygon, not scraped data from public websites. The backtest results? Fully deterministic and auditable, down to the price of the options contract to the amount of dividends we’re owed.

Press enter or click to view image in full size
A “backtest audit” allows us to see the event-by-event details of a backtest, so we know exactly why our strategy acted

Now, you can spend months building your own harness. You can spend hundreds on intraday options data, vibe code a data pipeline in Rust, spend months debugging why your simple debit spread strategy is losing 80% in one month when the underlying is flat, and ultimately spend your energy doing something someone else has already done…

OR you can try NexusTrade, build your strategy, eliminate the opportunity cost and deploy your first trading strategy.

You know how the harness works and how to audit every single thing it does. Don’t spend months re-inventing the wheel; build a bot that you can test right now.

Discussion

Sign in or create a free account to join the discussion.

No comments yet.