How to redirect RFSWARM result in another database like InfluxDB or Prometheus?

Hello Robot Framework community!

Today I have a question about database’s result redirection with the tool created by @damies13 rfswarm I’ve initiate in my company to make performance testing. I know Damies has already created rfswarm reporter to display the tool result , but considering my company’s needs , I want to add in Damies’ source code a way to choose another database type instead of SQLITE ( postgres, InfluxDB or others for instance).
The main gool is to display performance testing result using RFSWARM on Grafana trhough InfluxDB or Prometheus.
I’ve already succeed to display rfswarm’s data in Grafana’s board using an SQLITE exporter plugin , but that doesn’t suite the need of Devops team.
Did anyone has already resolve this kind of request?

Thanks so much in advance for any kind of help.

Regards

Hi,
I cannot really help you with RF Swarm - but I lately did some experiments with pushing test results to InfluxDB (to visualize them later with Grafana).
I used the Robot Framework API and created a ResultVisitor which parses the Test Results (output.xml) and creates items in InfluxDB for every completed test and every completed keyword.

Here is the code to do that, it’s just a draft:

import influxdb_client, os, time
from influxdb_client import InfluxDBClient, Point, WritePrecision, client
from influxdb_client.client.write_api import SYNCHRONOUS
from robot.api import ExecutionResult, ResultVisitor
import datetime

influxdb_token = <yourToken>
influxdb_org = "robotframework"
influxdb_url = "http://localhost:8086"


class Influx_Reporter(ResultVisitor):
  def __init__(self):
    self.write_client = influxdb_client.InfluxDBClient(url=influxdb_url, token=influxdb_token, org=influxdb_org)
    self.bucket="results"
    self.write_api = self.write_client.write_api(write_options=SYNCHRONOUS)
    self.test = {"name": None, "longname": None}

  def start_test(self, test):
    self.test["name"] = test.name
    self.test["longname"] = test.longname

  def end_test(self, test):
    point = (
        Point("testresult")
        .tag("name", test.name)
        .tag("suite", test.parent.name)
        .tag("release", "v5")
        .tag("status", test.status)
        .tag("message", test.message)
        .field("message", test.message)
        .field("status", test.status)
        .field("elapsedtime", test.elapsedtime)
        .field("starttime", test.starttime)
        .field("endtime", test.endtime)
      )
    self.write_api.write(bucket=self.bucket, org="robotframework", record=point)

  def end_keyword(self, keyword):
    point = (
      Point("keyword")
      .tag("name", keyword.kwname)
      .tag("library", keyword.libname)
      .tag("testname", self.test["name"])
      .tag("testlongname", self.test["longname"])
      .field("status", keyword.status)
      .field("elapsedtime", keyword.elapsedtime)
      .field("starttime", keyword.starttime)
      .field("endtime", keyword.endtime)
    )
    self.write_api.write(bucket=self.bucket, org="robotframework", record=point)

result = ExecutionResult("results/output.xml")
result.visit(Influx_Reporter())
2 Likes

Hello Many,

Thank so much for your sharing. The main goal is really close to mine so will take a look on it.

It’s interesting because your code could be used to display robotframework results in Grafana as well.
Generally I use a Grafana template made by an other persone to do this kind of thing.
So thanks so much, it’s really helpfull.
If I figure out a solution I’ll share it here.

Regards,

If you are not strictly limited to InfluxDB, there are also existing solutions out there for storing RobotFramework Results in a PostgreSQL DB with a Grafana Dashboard

I’m still experimenting with the InfluxDB stuff, still undecided what information I should store as tag and which ones as field.
Looking forward to hearing your solution :slight_smile:

Hi Barry,

I think Probably the best way to do this is:

  • grab a copy of RFSListener2.py from an agent machine, and rename it something like InfluxDBListener.py
  • Rename the class InfluxDBListener to match the filename
  • replace the function send_result with one that send the data you want to InfluxDB
  • add Metadata File InfluxDBListener.py to the *** Settings *** section of your script
  • in the Manager’s Scenario settings add --listener InfluxDBListener.py in Robot Options

Then RFSwarm will run as normal, it will just add an extra listener into robot framework and so as each keyword ends it’s result will be sent to InfluxDB at the same time as it’s sent to the Manager.

I should probably add that in the documentation…

Hopefully that’s what you need :crossed_fingers:

Dave.

1 Like

Many,

Thanks for your sharing . I had already implemented this template for my robot tests results.
It’s indeed interesting.

Thank you so much much for your help and sharing,

Sure, it’ll be a pleasure for me to share my success with the community.

Best regards,

Esther.

Hello @damies13 ,

I can’t say how much i’m gratefull for your reply and help on our issues.

I’ve, indeed planed to use the Listener file . I succeed to grap on the manager this file : RfswarmListener.py not the V2 version.

After a quick review, I understood that the metrics sended to the manager are thoses which are stored in the SQLITE “Results” table . But the metrics I need to grab are those located in the “MetricData” table which are in “MetricTime” , “Metricvalue” and “total_robot” column, and those values seams to be created as database view during the SQLITE’s creation.

So I’ll need the calculate all of them If I catch the metrics which are sended to the manager.
Here you can see a screenshort of what I succeed to display using SQLITE database in grafana. I want to do the same with InfluxDb or Prometheus.

On my opinion , what seams more easy is , to write an exporter in python for InfluxDb or Prometheus which will used the SQLITE data base.
Thanks so much for your help,

I’m sincerely grateful,

Regards

Esther

Hi Esther,

There are 3 types of metrics:

  1. Agent Metrics, This is information the agent sends to the manager about itself (this includes number of running robots running on the agent, the agent’s CPU, Memory usage stats etc)
  2. Metrics calculated by the Manager, as these calculated you can either calculate them or export them (total robots, percentile’s averages etc)
  3. Metrics generated by a robot script, typically these are related to monitoring the AUT, as you would have configured the robot script to send these to the manager it should be simple to also sent them to another system if you have these

Yes you are right in your case the best option would be to extract them after the test from the sqlite database, This is probably easiest done with a python script, RFSwarm uses the sqlite3 python module to create the database so if you use the same one to query the DB you should have no issues.
Unfortunately as sqlite is single user db you won’t be able to get these values until the test is finished as the file will be locked

Other options

  • Depending on the detail you need there is another option RFSwarm can output some data at the end of the test as text files, this button on the run screenimage, If that has all the information you need a simple CSV/TSV import might be possible

  • The output from reporter is HTML / XLSX / DOCX, so if you construct a reporter template with the information you want and generate an XLSX file you could import from excel, but I suspect this might not give you what you are after

Dave.

Hello @damies13 ,

Thank you so much , your reply really helped me to better understand some metrics.
We’ve finally decided to write in python an SQLITE to InfluxDB exporter . And that made de job.
I think this is the easiest way to do this.
An for now we’ll try this way.
For now we succed to create those grapghs in graphana:

But I’m not really sure, why at the end of the tests when robots fall we have those peaks on: Memory’s Agent, CPU Agent’s and Netwok Agent’ Graph.
I think perhaps it’s due to the SQLITE DB creation by the manager. What do you think?
I don’t really understand what Network agent usage used for?
And my last question is (after that I stop disturbing you) , I need to grab the absloute path of this directory which store results tests: 20231121_141341_kafka_scenario , do you have any idea where I can grab this?

Thanks so much for your help,

Regards,

Esther

Oh by the way , I have to clarify that the manager and the agent are located on the same machine.
We have 2 dockers , one for the manager and the other for the agent and they run on a runner machine.

Regards ,

Esther

Hi Esther,

disturb away, it’s fine really :grin:

if you look at the 1st graph (top left) in your screen shot you can see the orange line started around 15:01, after that the CPU, Network and memory all spiked, this is when the agent uploads the log files, so you will have smaller but similar peaks when you get fails, this is the default behaviour, i.e. if test passes logs are left on the agent and then uploaded to the manager after the test is finished, but if a test fails the logs for that failed test is uploaded to the manager imminently so you can review the logs on the manager if you need to. In the Scenario settings you can change this behaviour how you prefer it to behave.

It depends on your application, some applications have heavy network traffic and you might need to check that you are not exceeding the bandwidth capacity of your agent machine’s network connection, this is less of an issue now days with gigabit Ethernet, but if your agent was simulating a remote office on a 64k ISDN line you would care more, it’s there if you need it

If your python script has opened the sqlite file you have the path :wink: in python resultdir = os.path.basename('/path/to/your/sqlitefile.db')

Actually RFSwarm does everything with relative paths internally, so the full path is not stored in the db (well the full path of the robot files are but only so they can be put in the report, it’s never used by RFSwarm itself)

It’s perfectly reasonable that you might run the test with RFSwarm Manager on one machine and then copy the results folder to another machine to run the RFSwarm Reporter, the Manager might be Windows or Linux and the reporter might be MacOS or the other way around, RFSwarm should not care and should still work the same, It’s intended to be OS agnostic.

This is fine, nothing stopping you, I don’t recommend it because you can run into resource contention (especially CPU and memory), but if your runner machine. has enough resources to run the test without compromising response times, great go ahead understanding the risk. But I’d suggest monitoring the runner machine during the test to make sure it’s not resource constrained.

Dave.

1 Like

Hi @damies13 ,

Thanks for you reply :slight_smile: .
To your response :
“If your python script has opened the sqlite file you have the path :wink: in python resultdir = os.path.basename('/path/to/your/sqlitefile.db')” => What I wanted to say was: is it possible to grab somewhere the results directory’s name? To have the last directory created while testing.

About running the two dockers (agent and manager) on the same runner, I think perhaps we’ll change this in the futur. It’s isn’t that complicated so I try for now to make tests work in automation (and after I’ll make some adjustement).

With your help it begin to become more easy for me to interpret metrics.

Regards,

Esther

Hi Esther,

Not sure I understood the question, do you mean is it possible to get the directory to pass to your python script?

If so, yes, there are a couple of ways you can do that:

  • If you run the manager with debug level 1, then in the stdout you’ll see a line starting with datapath:, that should have what you want.

  • you can sort the directory listing of the results directory, using os tools or python’s os module to get the most recently created directory in the results directory.

    • on Linux ls -t |head -1,
    • on Windows dir /TC
    • in python see os.scandir()
    • To get the results directory, you can:
      • Read the RFSwarmManager.ini, look for the resultsdir entry
      • control it when you run the manager with the -d or --dir command line option

Hopefully that’s what you’re after,

Dave.

1 Like

Hi @damies13 ,

Thanks for your reply :pray:
Yeah , I wanted to grab the last result directory after tests campain execution so that I could grab the SQLITE data base and send it to InfluxDB to display our target metrics in Grafana.
Thanks for your suggestion, It’s really helpful, I’ll take it a look.

Recently I’m in a big rush, a big reorganization at work,

I’ll come back soon.

Regards,

Esther

1 Like

In my experience the most stressful day at work is still better than being unemployed for months and not knowing when you’ll have income again, now I look at stressful days at work as a blessing.

Take your time, be grateful and enjoy life :+1:

Dave.

2 Likes

:pray:
I though having some strong skills and tests hability could prevent us from this kind of situation.
Working as a freelance for instance can be stressfull some time for sure, but I though that in our filed area it’s hightly possible to make it without experimenting those kind of no mission moment and no payzone as well isn’t it?
Of course, you’re right, I’m just thinking…
Take care,

Regards
Esther

In normal times you are right.

I told you that to encourage you to never be depressed, When you are having a bad day just remember it could be worse.

There was a time several years ago my city had a 1 in 100 years flood, all the business abandoned their IT projects, all money was spent on getting their offices running again as the basements of most buildings got flooded when the water came up from the river through the storm water system. the whole city lot electricity for a month and most companies had their emergency generators in the basements, so it was a big disaster, the last time there was a flood like this was before there was electricity in my city. All the permanent staff continued to get paid to stay home, but all the contracts were cut short and because the companies had spent all their savings on repairs there was no new project and contracts for about a year.

That was years ago now and luckily I had savings, and could survive, many other IT people sold their houses left my city, some never came back, luckily for my children (who are now adults) they could stay at their school with their friends.

You never know what can happen, I am grateful it was just a flood and not a war like the poor people in Ukraine are suffering now.

Also never laugh at people who have things worse than you, help them if you can, life ebbs and flows differently for everyone, and you never know what’s around the corner.

Hopefully my words give you encouragement :crossed_fingers:

Dave.

3 Likes

Hi @damies13
Of course you’re right, thank you for your wise advices.

I have some other questions about my grafana’s results display :innocent: :pray: :innocent:.

I got this kind of result for my tests:

The ATU graphs are generetated by a nodeexporter maintained by my company’s devops team. For now, they handle the ATU server monitoring.

The second shortcut is the ATU monitoring displayed at the same time range than the Agent running.

I can’t figure out anything on the Nework agent graph.

I don’t know how to interpret the graphs of the ATU machine compared the agent machine’s one.
By any chance are you able to figure out something in these two shortcut?

And the last question :slight_smile:
If I wanted to add another agent on different machine to monitor my AUT , where could I configure in the scenario (.rfs file???) the : additionnal settings > agent filter part ?
I want to do it in configuration file because all my tests are launch in CI/CD.

Thanks so much for all your help,

Regards,

Esther

These graphs show %CPU, %Memory, and %Network of the agent machine, not the AUT, the 4th one that you labelled “Load Agent” is a “load” value the manager uses to help decide which agent should receive the next robot assignment it’s basically the max value of the other three, You can see what the agent reports to the manager about itself in the table here

These values are useful for you to know if your overloading your agents, but tell you nothing about the AUT. I can tell though with 4 robots you were at ~37% ram usage so this machine can probably only handle 9 of those robots before it’s memory constrained.

Also worth noting that the agent reports those stats to the manager every 2 seconds so if you get stats from the VM host they won’t be the same level of granularity so won’t loot the same.

These AUT stats your getting appear to be for a different machine? but even if not they look like they only take 1 sample measurement per minute or per 2 minutes, you’ll need to check with the person who setup that monitoring. also be careful if they are giving you a moving average or the average of the 2 minute sample time as an average of 20% cpu could be fluctuation between 10% and 30% or it could be steady at 1% with a short spike to 100%. When monitoring systems for performance you want samples to be every 5 seconds or less so you can catch the spikes.

If they are from different machines, they are not related, so don’t try to match them. only the CPU are both on percent

If they are the same machine:

  • CPU Basic shows Busy System, Busy User, Busy IO Wait and idle (in green), the %CPU from the agent is basically 100% - Idle% or you could also call it (% Busy System + % Busy User + % Busy IO Wait)
  • Memory Basic shows ~1.1GB used of 3GB, or approximately 37% memory used
  • The Load Graphs don’t match they are recording something altogether different
  • The Network Traffic Basic is showing you ~200kb/s received with a short burst of 600kb/s received at 15:07:00 and a steady ~200kb/s sent, to work out what that would be as a percentage you’d need to know what the maximum speed of the network interface is.

There is no need to install the agent on the AUT machines, you can just have one agent machine in the data centre and have it monitor all the AUT machines. you just setup robot script that connects to the AUT machines and monitors them, the page AUT Monitoring has examples how to monitor Windows and Linux AUT machines and the keyword Post AUT Stats in those examples can be used to post any metric data you want back to the Manager.

You don’t define the agents in the .rfs file, you just tell the agent where the manager is (swarmmanager in the Agent’s ini file or -m on the agent command line) and they self register with the manager and start reporting their data to the manager.

Dave.

1 Like

Hi @damies13 ,

Thanks for your reply , it’s really more understandable now.
The last question about “scenario”, I wanted to mean:
If I need to add several scenarios in my data.rfs file is this the right way to do it as the example below:

[Scenario]
uploadmode = err
scriptcount = 1
graphlist =

[1]
robots = 4
delay = 0
rampup = 30
run = 120
test = Send a message on kafka topic
script =/src/product/suites/api_test/Producer-consumer_Operating.robot

[2]
robots = 10
delay = 0
rampup = 50
run = 120
test = Another test
script =/src/product/suites/api_test/another_testsuite_to_run.robot

[3]
robots = 15
delay = 0
rampup = 100
run = 120
test = Yest another test
script =/src/product/suites/api_test/tyest_another_test_suite_to_run.robot

About the ATU monitoring , yes indeed it is a different machine than the agent.
Ok, I got it for the agent monitoring the ATU, yesterday I spend quite a bite of time reading this part in rfswarm documentation.
I think it will be better we will make our own ATU monitoring.

I’ll try it I think,

Thanks so much for your help,

Regards,

Esther