I’m working on test automation using Robot Framework, and I’ve run into a challenge. My application runs inside a NoVNC environment, and the entire UI is rendered inside an HTML element. That means I don’t have access to individual DOM elements, since everything is drawn as a single image on the canvas.
Right now, I’m using OpenCV to visually detect UI elements (buttons, icons, etc.). Once I find them, I get their coordinates and simulate clicks using tools like pyautogui, all wrapped into Robot Framework keywords.
The problem:
OpenCV detection isn’t always reliable. Small visual differences (resolution, scaling, color changes, etc.) can cause the element matching to fail.
Element coordinates shift when the window is resized or when resolution changes, which makes the tests brittle.
My question: Is there a more reliable way to detect or locate elements inside a canvas in this kind of setup (NoVNC)? Any alternative to OpenCV or tips to make this kind of image based detection more stable?
My Proxmox environment uses NoVNC so I inspected the html on that to have a look at what your up against.
Unfortunately on my side there’s no elements inside that canvas, the canvas just appears as an image and a hidden textarea, you interact by clicking the locations on the image and when you type, you’re typing into a textarea that then seems to send that keystroke info.
If your environment is the same, then you’ll have to choose between 3 options:
run robot framework on the remote machine (inside the environment that NoVNC connects to)
image based automation:
ImageHorizonLibrary
SikuliLibrary
As for tips for making image based detection more stable, use the smallest image that uniquely identifies what you want to interact with or reference, start with this post I made a while ago: ImageHorizonLibrary - #9 by damies13
I use ImageHorizonLibrary extensively for testing one of my applications on Windows + MacOS + Linux so have some experience in this topic, yes you will want to control and ensure resolution, scaling and colour depth don’t change, after that they can be fairly stable.
For text fields, use the labels, identify the co-ordinates of the label then click at an offset (usually below or to the right of the label), for buttons, use screenshots of the text portion of the button’s face, not the full button, so you avoid the width/height markers of the button in case the button changes size when the app is windowed vs maximised. if you do need to include the sides of the button, keep the number of button sides to 1 max 2 sides, then apply the label / button approach to the other element.
If you’re having trouble getting an image to recognise reliably share a screenshot of as much as you can the screenshot you’re using and the screen your trying to match it to and I’ll offer suggestions on what I would do.
Thank you very much for your response. To clarify, is there really no way to interact with the application other than using an image recognition library? Is there something similar to what we would do with XPath or CSS Selectors?
As you are essentially working with automating VNC session and VNC is just mirroring the image on the desktop of the remote host - eg, its just pixels on the canvas - there’s no way to do that. However as @damies13 suggested:
run robot framework on the remote machine (inside the environment that NoVNC connects to)
In this case, you dont need to run the “whole robotframework” on the actual host, you could just use remote libraries that interact locally with your app and still see the desktop via VNC .. However, this would require a network connectivity between the SUT host and where you are executing the robot itself because rf communicates (typically) over http to remote library…