Experiment 002: Agentic Loop Cuts PMax Tuning to 48 Hours

Every Performance Max consultant on LinkedIn is selling "AI automation." Almost nobody is publishing a controlled comparison. The reason that gap exists is also the reason the experiment is worth running: the typical signal-tuning cycle inside a serious PMax account is roughly six weeks — pull diagnostics, form a hypothesis, build the change, get approval, deploy, wait for the next clean reporting window. By the time the loop closes, the market has moved and the team is already on the next fire. Most "automation" promises are vague about which part of that loop they actually compress.

This experiment narrows the question. Can a small agentic loop — one that reads the diagnostics PMax already exposes, proposes a short list of signal-environment changes, and stages them for human approval — collapse the cycle by an order of magnitude? Not "does AI help." Not "does an agent feel useful." A timed, before-and-after comparison of how long it takes a team to go from a fresh PMax diagnostic to a deployed signal change.

Hypothesis

An agentic loop collapses signal-tuning cycle time by an order of magnitude.

The hypothesis is that an agent reading weekly PMax diagnostics — placement-level performance, asset-group health, conversion value signal density, audience-signal contribution — can propose 3–5 targeted signal changes per cycle, score them by expected lift, and stage them for a human reviewer in a structured queue. The human still approves and deploys; the agent does the reading, the synthesis, and the first draft of the recommendation. The cycle time drops from six weeks to roughly forty-eight hours, while the CPA delta vs. a manual baseline stays at parity or better.

The thesis is structural: most of the six weeks is not the AI part of the loop. It is the human part — the time to read the diagnostics, write the brief, route it for approval, and wait for the next reporting window. Compressing the human part by automating the reading and the first draft of the brief is where the order-of-magnitude shift would have to come from. If the agent's recommendations are wrong or noisy, the time savings disappear because the human reviewer rejects most of them. So the test is really about recommendation quality at speed, not raw automation.

Method

An agent reads diagnostics. A human approves. Both clocks run.

The setup is deliberately small. One PMax account, mature, with at least 90 days of stable historical performance. An agent built on top of the platform's diagnostics export — placement reports, asset performance, audience-signal contributions, conversion value rules — that produces a structured weekly brief: three to five proposed signal changes (audience signals, conversion value rules, asset group splits), with rationale, expected effect, and a confidence rating. The human reviewer approves, rejects, or edits each one before it ships. The manual baseline is the same account's actual signal-tuning cadence from the prior six months, measured end-to-end.

Test surface

One Performance Max account, mature, 90+ days of stable history, no concurrent structural changes during the test window.

What the agent does

Reads weekly PMax diagnostics. Proposes 3–5 signal changes (audience, conversion value, asset group). Scores and queues them for human approval.

What stays human

Final approval. Deployment. Strategic judgment on whether the proposed change fits the current quarter's brand and budget posture.

Measurement

Time-to-decision (diagnostic to deployed change) and CPA delta vs. the manual baseline, over an 8-week running window.

The reason this design matters: the experiment is not "give the agent the keys to the account and see what happens." It is "let the agent do the work that is genuinely AI-shaped — reading, summarizing, drafting — and measure whether the human-in-the-loop bottleneck moves enough to compress the cycle without losing quality." If the agent's recommendations are usable as drafts but not as final decisions, that is still a real win, because the read-and-draft phase is the largest chunk of the manual baseline.

"The bottleneck moved. It's not 'can the agent decide' anymore — it's 'can the org approve fast enough to keep up.'"

Early result

~5× faster on the first three iterations. CPA impact still in measurement.

The experiment is currently in week 4 of 8. What I can say at the mid-point: cycle time on the first three iterations dropped by roughly 5× against the manual baseline — a clean diagnostic-to-deployed-change cycle is running in 36–60 hours instead of the prior 4–6 weeks. The CPA impact is still being measured against a clean control window. I will not publish the CPA delta until the test reaches statistical confidence; the cycle-time number is mechanical and reportable now.

The unexpected finding is the one worth flagging early. The bottleneck moved. When the agent's brief lands in a structured queue with rationale and confidence ratings, the human approval step does not take longer because the brief is good. It takes longer because the org's approval process was designed around a six-week cadence. Reviewers schedule themselves to look once a week. Stakeholders expect a Friday rollup, not a Tuesday afternoon push. The tooling can produce a deployable change in 48 hours; the surrounding workflow cannot yet absorb one.

That is a Cost Per Decision problem, not a tooling problem — and it is the more interesting finding. The argument I had been making was that agentic AI compresses the analytical part of the loop. The mid-run evidence is that it compresses that part so completely the organizational part becomes the new constraint, on a different timescale. The next thing to test is not a smarter agent. It is a different approval cadence.

What changed in the thinking

The hard problem in agentic marketing is not the agent. It is the organization the agent runs inside.

Going in, I treated the human-approval step as overhead — a fixed cost that would shrink as the team got comfortable with the agent's output. Mid-run, it looks more like the binding constraint. The next experiment in this thread should test approval cadence, not model quality. A weekly approval rhythm wastes most of the cycle compression. A continuous approval rhythm reshapes the role of the human in the loop. That is closer to the real product than the agent itself.

Status: in progress, week 4 of 8. Final cycle-time and CPA-delta numbers will be added when the test reaches statistical confidence. Subscribers to Stay Sharp get the update before it's indexed.

Can an agentic loop compress a Performance Max signal-tuning cycle from six weeks to forty-eight hours?

An agentic loop collapses signal-tuning cycle time by an order of magnitude.

An agent reads diagnostics. A human approves. Both clocks run.

~5× faster on the first three iterations. CPA impact still in measurement.

The hard problem in agentic marketing is not the agent. It is the organization the agent runs inside.

Get the experiments first.