OpenAI API accounts were charged duplicate ledger deductions
negative balances + false "429" errors
OpenAI StatusRetries are normal. Side effects should still execute once.
Wrap tool handlers once for idempotent retries, integrity checks, and review workflows.
# LLM emits a tool callresponse = llm.chat(...)tool_call = response.tool_calls[0]# Agent crashes. Restarts. Retries.result = ledger.run(tool_call, handler=charge) # runs onceWorks with OpenAI, Anthropic, LangChain, LlamaIndex
Public production incidents show the same failure mode: successful tool calls re-run after retries, interruptions, or approval-state drift.
OpenAI API accounts were charged duplicate ledger deductions
negative balances + false "429" errors
OpenAI StatusReplit agent reportedly deleted a production DB during a code freeze
plus reported fabricated test data/users during recovery
Ars Technica reportLiveKit: interrupted tool run lost results and re-executed create_reservation
production report with identical reservations created twice
GitHub: livekit/agents#3702LangGraph human-approval flow produced 3 tool results for 1 sensitive call
same sensitive action was re-executed after approval path
GitHub: langchain-ai/langgraph#4397LangChain HITL: after human edits a tool call, agent re-attempts original call
edited action can be followed by original unedited action
GitHub: langchain-ai/langchain#33787A timeout retry re-runs a side-effecting tool call in your stack
This is the retry/replay failure mode idempotent execution is designed to stop
| Date | Incident | Impact |
|---|---|---|
| Jul 2025 | OpenAI API accounts were charged duplicate ledger deductions OpenAI Status | Quota lockout in production negative balances + false "429" errors |
| Jul 2025 | Replit agent reportedly deleted a production DB during a code freeze Ars Technica report | 2,400+ records wiped plus reported fabricated test data/users during recovery |
| Oct 2025 | LiveKit: interrupted tool run lost results and re-executed create_reservation GitHub: livekit/agents#3702 | Duplicate orders created production report with identical reservations created twice |
| Jun 2025 | LangGraph human-approval flow produced 3 tool results for 1 sensitive call GitHub: langchain-ai/langgraph#4397 | Triple-execution risk same sensitive action was re-executed after approval path |
| Nov 2025 | LangChain HITL: after human edits a tool call, agent re-attempts original call GitHub: langchain-ai/langchain#33787 | Approval/edit drift risk edited action can be followed by original unedited action |
| ? | A timeout retry re-runs a side-effecting tool call in your stack This is the retry/replay failure mode idempotent execution is designed to stop | Designed to be blocked |
Agents are already in production. Execution safety controls need to be too.
Step 1 · Idempotency
Hash the call, check the ledger, and replay recorded results on retries.
This prevents duplicate side effects when tools are retried.
Same call. Same key. Retries replay instead of re-execute.
result = await ledger.run(
ToolCall(
workflow_id="order-123",
tool="stripe.charge",
args={"amount": 100, "currency": "usd"}
),
handler=charge_customer,
)
# Retry this 100 times. Handler runs once.Step 2 · Integrity Checks
Idempotency ensures a call runs once. Integrity ensures it runs exactly as approved.
Runtime payload hash must match approved payload hash.
If arguments drift after approval, execution is blocked and a new decision is required.
Many HITL systems approve the tool, not the exact payload.
If arguments change between approval and execution, the original approval can still pass.
agent-ledger binds approval to the exact payload hash. If any argument changes, that's a new review decision.
What was checked is what runs. No payload drift between review and execution.
# You approved THIS exact call:
ledger.run(
ToolCall(
workflow_id="order-456",
tool="stripe.charge",
args={"amount": 100, "customer": "cus_123"}
),
handler=charge_customer,
requires_approval=True
)# Agent retries with different args?
# That's a NEW approval request.
args={"amount": 10000, "customer": "cus_123"}
# ↑ Changed? Blocked.Reviewed payload in. Same payload out.
Reliable agent execution starts with agent-ledger.
Idempotent. Verified. Reviewable. Just a library.
Same call key, one execution. Retries replay the recorded result instead of re-running your handler.
Execution re-checks payload hash at runtime. If args differ from what policy or review approved, that run is blocked.
High-risk actions can require human review. Low-risk actions stay automatic, with full execution history for audits.
Free and open source. Apache-2.0. No account required.
Wrap your handlers once for replay-safe retries and payload-integrity checks.
Apache-2.0 · Free and open source · No account needed
For teams
Use agent-ledger in each app. Add rune0 when you need shared policy decisions, routed reviews, and integrity-verified audit exports across environments. Same SDK. No rewrite.
Define policy-as-code once: allow, deny, or require review for each effect.
“Not a best-effort prompt. An enforced decision.”
Escalate high-stakes actions to the right humans with the exact payload and full context.
“High-stakes actions wait. Low-stakes actions flow.”
Integrity checks block payload drift (TOCTOU). Every decision and result is logged for audit exports.
“What was checked is what runs - and you can prove it.”
For teams scaling agent-ledger in production. We'll email you when preview slots open.
No spam. Unsubscribe anytime.
Start with agent-ledger. Scale with rune0.