Need for Speed in Agentic Finance

By Filip Michalsky

A series of learnings from applying AI to financial services, payments, and payment operations in the field.

Who is this for?

Innovator leaders at small and medium-sized companies who want to see how workflow automation is being executed in the real world, written by someone deeply passionate about applied AI.

Everyone talks about how we need to deploy agents, but what does that actually mean?

This is the first post in a series on how the day-to-day life of a founder in agentic finance has changed for me over the past 12+ months: practical lessons and field notes from deploying agents inside real payments and payment-ops workflows, as well as in general business and coding.

Let's dive in.

First, and not surprisingly, I have not written a line of code by hand for over a year. For some tasks, almost two years. The only code I still write is small data exploration snippets in Jupyter notebooks. I most definitely do not write SQL anymore. Only this year did I start using real business agents beyond deep research artifacts or pick-your-favorite LLM chats in UI tools.

These days, my mental model for agents in our business is two-fold. I know this will change again in the coming months.

Tech Agents

This was the first killer use case. From tab completion to end-to-end agents and the excellent essay by Michael Truell, co-founder of Cursor (here on X), we have come a long way.

I installed Claude Code around April 2025. At the time, the agentic loop got a lot wrong and longer rollouts would deteriorate quickly in quality to the point of becoming unusable. So I kept flip-flopping between Cursor and Claude Code. Then RL post-training conditioned on agentic harnesses landed, and Claude Code 2.0 improved significantly.

We use these setups today:

  • Codex: plan plus execute. I iterate on the plan several times, then execute.
  • Claude Code: same process. I compare both plans as my version of sampling and pick the better one.

This works best when I am in a silent room and can focus. I still struggle to work on multiple features at once. Some people claim they can do this in worktrees, but the attention switching kills me. As context grows, my brain becomes a blocker, so I would rather move a little slower than write spaghetti code with regressions.

When I am done, I inspect diffs in the GitHub UI. I still use Cursor sometimes, mostly as an IDE and Cursor Bugbot. It feels like the post-trained RL harness of Claude Code now performs well even on longer trajectories.

Business Agents

This used to not really work. It sat in the too hard, too brittle, I do not have time bucket. Then OpenClaw rose and made execution easier, and everything changed this year. Long-horizon trajectories started working for me in business settings.

We incorporated it into the business side of our company. Alongside other agent tools, it has made the startup journey much faster and also scarier, because it can feel like losing some level of control.

Do not mistake OpenClaw for an agent panacea. The problems that have existed since 2023 (hallucinations, wrong context, fragility in production) still exist, even if they are less severe now. Since launching SalesGPT in April 2023 (~2.5k stars on GitHub), I wanted a deeper automation stack with the right abstraction layer. I do not know if OpenClaw is the final answer, but it is a lot easier to use in our GTM and operations workflows. I am sure there are other tools, but I do not have time to try all of them.

I have spent a lot of time setting up OpenClaw automations across the business. Some are cron jobs. Some are truly HEARTBEAT.md based with limited determinism. More on this in future posts. When I saw our CRM auto-populating all interactions from all channels after two weeks of fiddling, it felt worth it.

It feels like a rapid organizational shift is underway and accelerating. At Soap, we already have roughly a three agents to one employee ratio. That ratio is rising as we keep finding narrow use cases where agents can perform reliably enough. It takes a lot of work, especially in financial services.

Agents : employees ratio

Start of year0.5:1Now3:1

Illustrative trend, not an exact time series.

Agent Run Log

Agent run log table with inputs, tools used, outputs, approval, timestamp, and trace ID columns

Example run-log snapshot of agentic workflows and approvals.

I do not think demand for human labor is dropping anytime soon, but all of us need to learn and adjust.

What Comes Next

In upcoming posts, I will share how we think about privacy, security, and the battle scars from building our homegrown agentic OS, including what it takes to make agents actually work in financial services.

See you there.