Quality Toolkit for OpenClaw
Two skills that work together: challenge your assumptions before building, then hunt for bugs after. A quality assurance system for AI agent workflows.
⚠️ Before installing or using this skill, review its full contents — including all scripts — to ensure they meet your security and quality standards. Users take ultimate responsibility for any skill they choose to use. These are community-sourced and updated.
Skill 1: Premise Check
A 7-step reasoning checklist that forces genuine challenge of assumptions before committing to an approach.
- State your core assumption
- Design the best solution that doesn't rely on it
- Compare honestly
- Steel-man the opposite
- Find the hidden cost
- Ask "reactive or proactive?"
- Explain it to a skeptic in 2 sentences
Skill 2: Stress Test
A 4-layer testing checklist covering immediate validation, adversarial testing, deployment verification, and type-specific checks.
- Run with real data, in the real execution context
- Bad input, missing dependencies, timing failures
- Silent failure detection
- Logging, monitoring, credential lifecycle
- Type-specific checks: APIs, cron jobs, services, static sites, GitHub Actions
The Problem
AI agents are fast builders. They'll propose a solution and start implementing it before you've finished reading the proposal. That speed is great — until the first idea wasn't the best idea, or the implementation has silent bugs that won't surface until something breaks at 2 AM.
These two skills add friction in the right places: before you commit to an approach, and after you've built it.
How They Work Together
The toolkit follows a simple cycle:
- You describe what you want to build
- Premise Check runs — your agent challenges its own assumptions, designs the opposite approach, and compares honestly before presenting a recommendation
- You build it
- Stress Test runs — your agent systematically tries to break what it just built, checking for silent failures, missing error handling, timing issues, and deployment gaps
The skills are independent — you can use either one alone — but they're designed as a pair. Premise Check prevents building the wrong thing. Stress Test prevents shipping a broken thing.
Origin Story
These skills were born from a real conversation. We were building an event archive automation system and asked our agent to verify its proposal was the best approach by challenging itself five times. It generated five questions, answered them in one pass, and defended its original idea.
Then a less technically versed user asked one question — "why poll at all when you could just schedule it?" — and the agent immediately recognized this was a fundamentally better approach. The agent had challenged its alternatives but never its premise.
Premise Check exists because of that moment. Stress Test followed naturally: once we had the right approach, we needed a systematic way to find the bugs the agent would otherwise ship silently.
When to Use Each
Premise Check — Before Building
- Proposing a new workflow, integration, or system architecture
- Choosing between approaches
- Any time you're about to build something that would be expensive to rebuild
Stress Test — After Building
- After completing a script, API, cron job, or deployment
- After significantly modifying an existing system
- Before declaring anything "done"
Example: Event Archive Scheduler
Here's how both skills were used on the system that inspired them:
Premise Check found:
- Core assumption: "We need to poll for event endings" → wrong. Proactive scheduling from known event times is better.
- Then challenged again: "We need per-event scheduling" → a daily GitHub Action would be simpler. But the precision was worth the complexity as a craft exercise.
Stress Test found 3 bugs:
- Cleanup logic deleted future scheduled triggers (not just past ones)
- curl exited 0 on authentication failure — silent failure on expired tokens
- GitHub token was hardcoded instead of reading from single source of truth