Robot xml report(output.xml) incomplete/missing closing tags

Hi,
Please I need help/advice on an issue related to the generation of XML report(output.xml), since in the robot test we are using we have from time to time generated output.xml that is incomplete/missing closing tags so generation or log.html is impossible.

  1. We are running tests on Windows machines, through Jenkins where we running robot tests and there is the same Windows machine running the same test suites/test cases, but the generated XML is sometimes incomplete(missing closing tags), and missing closing tags are not at the same test case/keyword.
  2. We did tests running directly in cmd calling robot and there we have the same behavior sometimes works sometimes not (to eliminate possible Jenkins influence)
  3. We tried robot --splitlog --loglevel TRACE --console verbose -b robotdebug.log to split xmls , and activate extra logging but still error appears
  4. We set robot debug and python buffering: set PYTHONUNBUFFERED=1;set ROBOT_SYSLOG_FILE=robot_syslog.log, but nothing related to “XML output errors”
  5. We checked on Windows EventLogs/Mem/CPU usage while running, no issues there(Win Machine with 32-64Gb rams)
  6. We tried different robot version v4,v5,v6 still no improvements
  7. Generated XML for ok test cases are not so huge around 10-13mb, and we created a dummy robot test that generated >50mb XML in size with no issue, so XML size should not be the case
  8. We created a test robot where we used “complicated” python popen/background app calls to open applications on Windows and terminate those by a user to simulate “unexpected” behavior but the robot was closing XML properly in this test.

We are not able to find a valid pattern/scenario/replicate test case where this is always happening so then we could at least start from there.

So far the conclusion, but it is just guessing, is that this is related to robot XML writer or Windows file buffering or some other Windows resource.
Any idea how to troubleshoot/debug robot XML writer or Windows buffering/resources?

Please if someone has experienced something similar or any idea or hint about what we can do to troubleshoot?

Thank you so much for any tips or suggestions!

I don’t have experience in debugging XML output from Robot Framework, I only care in recovering the output.xml.

To provoke a broken output.xml, you could try to kill the robot process. For example, on Windows you can use the commands:

tasklist | findstr robot

To know the process ID (PID), and then:

taskkill /F /PID <robot PID>

That may break the output.xml.

You can fix a broken output.xml by completing the missing tags. I have this tool for that. Use it like I do in this example:

C:\tmp>rebot output_broken.xml
[ ERROR ] Reading XML source 'output_broken.xml' failed: ParseError: no element found: line 50, column 0

Try --help for usage information.

C:\tmp>python \tools\xml_repair_tool.py output_broken.xml
</errors>
</robot>

C:\tmp>python \tools\xml_repair_tool.py output_broken.xml > fix.xml

C:\tmp>copy /b output_broken.xml+fix.xml output_fixed.xml
output_broken.xml
fix.xml
        1 file(s) copied.

C:\tmp>rebot output_fixed.xml
Log:     C:\tmp\log.html
Report:  C:\tmp\report.html

Case 8 could point to a scenario that does happen explicitly on windows.

If python/robot code opens pipe and output is directed to a buffer and that buffer is not fully read by the instance that opened it - RF will hang indefinitely. If i remember correctly, this hang depends on how much output the process generates. Eg, make sure any subprocess you open and pass stdout and/or stderr beside None, read the whole buffer even thought you do not need the ouput.

In my cases in the past, i did have jenkins to kill robot and while log did show that output.xml was being written, it had similar issues as you described (invalid xml)

Thank you so much for your reply.
Yes we had the same idea and script to add/close missing tags so at least what is available in XML to be visible in log.html. I tested your script it is working, BIG THANKS for sharing!

Still, we need to find why robot XML writer or possibly robot.exe is terminated unexpectedly.

Yes we are investigating the possibility that during runtime while robot test is running that is started as child process from Jenkins job(on windows javaw.exe), we might have issues that connecting with Jenkins disconnects/reconnects and then javaw.exe impact chield robot.exe/xmlwritter. So we did tests without Jenkins directly in cmd/python but we had the issue.
But definitely we need to investigate this buffering on our machine

Just to be clear - Issue I described has 0 to do with Jenkins and 100% with how windows handles buffering but it came up when i was running tests in cross platform environment with jenkins…

So, if you are utilizing Start Process or Run Process keywords from robot framework side or anything from python subprocess module where you pass in stdout or stderr arguments that are set to None , check that those outputs are not left dangling (un-read state) and make sure that all the processes started by RF are stopped/killed.

Thanks for the clarifications. I have to find-out on windows how to check any robot.exe started test case started child processes (if are “hanging”) and “terminate” those that probably will make robot+xmlwritter to work correctly. Btw lot of those test cases are not written by our team so we have no full control of used test cases so anyone can create in python whatever…but we need to provide “cleaning” mechanism.
Thanks a lot again

One “trick” to do that would be to set env variable to something unique before launching robot for that particular jenkins job and find all processes where that env is present and kill those. I have faint feeling that Jenkins also does this by default to some degree so that the slave node knows what processes to kill if the job has timeout functionality.

Thank you for the comment.
We have developed a Python script that monitors robot.exe process and all child processes started by robot.exe, and when the robot is terminated, the script lists “left over” processes if any.
This is a good approach and indeed it helped us. What we have identified is that in the last running robot keyword, we were launching a python that was launching another “python-based” application that actually crashed, and due to that python crashed as well as robot.exe, so output.xml could not be closed normally.
We checked Windows Event/application error log and confirmed that.
Tip:
use this cmd on windows to get Win Event/App error log:
wevtutil qe Application /q:“*[System[(Level=2) and (EventID=1000)]]” /f:text

Another tip:
On windows if you see: WerFault.exe process(we had it) that you know something “nesty” happened since is Windows process for Error Reporting then for sure you need to check EventLogs like above

I wonder if not using the robot.exe would make a difference.
You should try to run it as python module:

python -m robot ...