I want to share a tool I’ve been building: Result Companion — a CLI that reads your output.xml and enriches log.html with AI-generated analysis per failed test: root cause, test flow summary, and suggested fixes.
The problem it solves
You run your suite, something fails, and then you spend 20 minutes tracing keywords through the log trying to figure out why. Result Companion does that trace for you and gives you a plain-English explanation in seconds.
Open rc_log.html — each failed test now has an AI analysis attached.
What it supports today
Local models via Ollama (free, private)
GitHub Copilot — if you already have a Copilot subscription (Business, Enterprise, or Pro+), you can use models like gpt-5-mini at no extra cost
OpenAI, Azure OpenAI, Google Gemini, Anthropic, AWS Bedrock
Any OpenAI-compatible endpoint (Databricks, self-hosted, etc.)
Tag-based filtering (–include, --exclude)
Text output for CI pipelines or agent workflows (–text-report, --print-text-report)
--overall-summary for a synthesised digest across all failures
Fully customisable prompts — the default analyses failures, but you can swap in a security audit, performance bottleneck review, or test quality assessment just by changing the question_prompt in your config
Where it stands
It’s early — version 0.0.6, marked Beta on PyPI. The core workflow is solid, but I’m actively looking for real-world feedback: edge cases in output.xml parsing, models that behave unexpectedly, workflows that don’t fit the current config model.
A suggestion based on my own mistakes, rather than tightly integrating with output.xml, use the --xunit option in robot framework and process based on the xunit file, because the output.xml file can change with different versions of robot framework and that might make your library maintenance harder in the long term.
Having said that the output.xml format has been quite stable, and only had 1 major change in the last 7 years.
Thanks for the heads up! That’s a really practical suggestion — I’ll definitely keep the xunit option in mind as a future improvement. Good to know the output.xml format has been stable though, that gives me some confidence for now.
FYI - The big change was with RF7.0 (when the --legacyoutputwas added) so depending on which version of RF you based on (I’m guessing the more recent one) you might find users find your tool wont work if they have the other format.
Might be something to note in your support documentation as well
Thanks for flagging this! I actually investigated it and result-companion is safe here — it doesn’t parse output.xml directly. Instead, it relies on robot.api.ExecutionResult which seems to handle schema differences natively across versions. I’ve verified it works with outputs from RF 4.0 through 7.x without issues.
Thanks for the feedback on the initial release — it drove a bunch of stability fixes (retry logic, fail-fast on auth issues, better suite setup handling).
New: test fails, agent finds the guilty commit
result-companion review takes your failure analysis, hands it to a Copilot agent that reads the actual PR diff through GitHub MCP, and posts a comment pinpointing which code change likely caused the failure — with file, line, and a suggested fix.
Robot tests carry rich execution context (keyword chains, error messages, timing) that generic review tools never see. Feeding that into the same agent that reads the diff is what makes it work.
Real example: PR #65 — agent caught that interactive gh auth login would fail in CI and suggested token-based auth.
Try --preview to see the output without posting anything.