How to automate Audio in Robotframework

Hi everyone, can someone help me?
I am currently automating a mobile banking application..When I successfully make a transfer, the app plays an audio message saying “You have successfully transferred 2 dollars.”
I want to verify whether the audio message is exactly what I expect.
Is there a way to automate the verification of this audio?
Thanks for support.

Hi Alice,

I’ve not seen an audio library for robot framework, but it should be possible to create one using python audio modules.

My guess what you’ll need to do is use the python audio module to create a loopback device and then redirect the audio from the os into this device and capture the audio to a wav file, then compare that wav file to a known source.

How are your python programming skills? While I wouldn’t call this a trivial python project it should be achievable as a few python functions that you can make callable from RF as keywords.

Dave.

3 Likes

Thanks so much for your suggestion – it’s really helpful! I’ve actually been exploring this idea and found a similar approach mentioned on ChatGPT as well: using a Python audio module to capture and save the system audio, then comparing it to a known file.
I think I’ll give it a try and see how it goes.
Thanks again for your support !!! :grin:

Hi Alice,

There’s quite good documentation on writing python library files here: Robot Framework User Guide

A few of us on this forum have done it (just not for audio) so just ask if something’s not clear.

Dave.

1 Like

I would also look into visualizing your recorded audio as a waveform and doing a visual comparison against a reference waveform.
I’m sure there are python modules out there to record audio and convert it.

1 Like

If the importance is to only verify the text content - there are ways to transcribe your audio to text (there are a lot of AI services offering that, but I’m sure there are also models and tools that you can run locally) .
Then you would just compare the transcription against a reference text

1 Like

I’m sure there are also models and tools that you can run locally

Whisper runs locally and its pretty much standard tooling to translate speech to text. There’s free mac tool in app store but binaries can be downloaded with brew (openai-whisper). Other platforms most likely have binaries available too..

1 Like

Recorded a small audio clip and ran whisper against the wav file:

rasjani@Mac ~/tmo/bounssit$ time whisper 02.wav  --output_format txt --language en
/opt/homebrew/Cellar/openai-whisper/20250625/libexec/lib/python3.13/site-packages/whisper/transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
[00:00.000 --> 00:06.600]  you have successfully transferred two dollars

real	0m13.261s
user	1m10.391s
sys	0m5.182s
rasjani@Mac ~/tmo/bounssit$ cat 02.txt
you have successfully transferred two dollars
rasjani@Mac ~/tmo/bounssit$

Works :wink:

2 Likes