Hi everyone, can someone help me?
I am currently automating a mobile banking application..When I successfully make a transfer, the app plays an audio message saying “You have successfully transferred 2 dollars.”
I want to verify whether the audio message is exactly what I expect.
Is there a way to automate the verification of this audio?
Thanks for support.
Hi Alice,
I’ve not seen an audio library for robot framework, but it should be possible to create one using python audio modules.
My guess what you’ll need to do is use the python audio module to create a loopback device and then redirect the audio from the os into this device and capture the audio to a wav file, then compare that wav file to a known source.
How are your python programming skills? While I wouldn’t call this a trivial python project it should be achievable as a few python functions that you can make callable from RF as keywords.
Dave.
Thanks so much for your suggestion – it’s really helpful! I’ve actually been exploring this idea and found a similar approach mentioned on ChatGPT as well: using a Python audio module to capture and save the system audio, then comparing it to a known file.
I think I’ll give it a try and see how it goes.
Thanks again for your support !!!
Hi Alice,
There’s quite good documentation on writing python library files here: Robot Framework User Guide
A few of us on this forum have done it (just not for audio) so just ask if something’s not clear.
Dave.
I would also look into visualizing your recorded audio as a waveform and doing a visual comparison against a reference waveform.
I’m sure there are python modules out there to record audio and convert it.
If the importance is to only verify the text content - there are ways to transcribe your audio to text (there are a lot of AI services offering that, but I’m sure there are also models and tools that you can run locally) .
Then you would just compare the transcription against a reference text
I’m sure there are also models and tools that you can run locally
Whisper runs locally and its pretty much standard tooling to translate speech to text. There’s free mac tool in app store but binaries can be downloaded with brew (openai-whisper). Other platforms most likely have binaries available too..
Recorded a small audio clip and ran whisper against the wav file:
rasjani@Mac ~/tmo/bounssit$ time whisper 02.wav --output_format txt --language en
/opt/homebrew/Cellar/openai-whisper/20250625/libexec/lib/python3.13/site-packages/whisper/transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
[00:00.000 --> 00:06.600] you have successfully transferred two dollars
real 0m13.261s
user 1m10.391s
sys 0m5.182s
rasjani@Mac ~/tmo/bounssit$ cat 02.txt
you have successfully transferred two dollars
rasjani@Mac ~/tmo/bounssit$
Works