Memory Spike and Resource Utilization

When running long-running UI scraping tasks (around ~2 hours), we observe a gradual increase in memory usage on the server, eventually spiking to ~5 GB. What are the best practices to diagnose the root cause and optimize memory consumption for this type of automation? Specifically, what steps (profiling/monitoring, coding or design changes, batching/checkpointing, cleanup/garbage collection, etc.) would you recommend to prevent memory growth over time?

Note: I have done garbage collection at each point wherever it’s possible.

Hi RK,

A couple of pieces of information that will help us give you a better answer:

  • which library are you using to do the UI scraping tasks? (based on your comment about garbage collection I might guess Selenium but not sure)
  • Are you seeing the high memory usage on the application server of the application your scraping or the machine running the robot task (task or test?)
  • Which OS are you seeing the high memory usage on?

Knowing that will help us narrow down the suggestions.

As a general rule of thumb ending sessions can help release memory, so maybe structure your robot differently to reduce the session time of your scraping user, that said 2hrs is not particularly long for a session time on a client server desktop app, but could be very long for a web based app.

Also does the scraping task have to be done by a single user? could you have 2 or 4 robots running in parallel to do the scraping task and potentially reduce the time the scraping takes?

Dave.

1 Like

Hi Dave — we’re using Selenium along with mostly built-in libraries. We’re seeing memory usage steadily increase on the machine running the bot. The runtime is a cloud pod (Linux-based container). It’s a single process executed by one worker, and the flow is largely sequential/dependent, so we’re not sure how to split it into parallel workers. If you have any ideas or approaches to manage memory or restructure the run (even without parallelization), I’d love to discuss and learn. Thanks!

It’s a single process executed by one worker,

  1. process as in “series of actions or steps taken in order to achieve a particular end.” or
  2. process in operating system sense ?

Probably 1 ? Because if you are using robot framework with selenium, thats already one process for python that runs RF, second process for webdriver and third for the browser.

With what ever information you have shared, I’d make an assumption that its the browser itself that growns in memory usage and if that’s the case, the only way to help with that is to restart the browser during your “scraping”.. How to archive that really depends on your (test/automation) codebase and a website you are scraping on but without details, all I can say are just generic tips..

1 Like
chrome --type=renderer --crashpad-handler-pid=8286 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 --user-data-dir=/tmp/.org.chromium.Chromium.VdDftM --change-stack-guard-on-fork=enable --no-sandbox --disable-dev-shm-usage --enable-automation --enable-logging=stderr --log-level=0 --remote-debugging-port=0 --test-type=webdriver --allow-pre-commit-input --ozone-platform=headless --disable-gpu-compositing --lang=en-US --num-raster-threads=2 --enable-main-frame-before-activation --renderer-client-id=7 --time-ticks-at-unix-epoch=-1770883904901371 --launch-time-ticks=613197413326 --shared-files=v8_context_snapshot_data:100 --field-trial-handle=3,i,17879469721122141560,14110402562888464252,262144 --disable-features=PaintHolding --variations-seed-version 

This is the process which is growing over the time

so its the browser (chrome) as i expected.

Chrome is known for its insane memory peaks. Google Search

And as i suggested, stop/kill your browser fully and do your scraping in smaller patches ..

Or switch to firefox ?

1 Like

well if it’s the chrome process that’s getting larger over time, then garbage collection won’t help.

so a few things to consider:

  • normally I’d suggest using Close Window periodically, but Selenium Library doesn’t give you an easy way to open a window
  • so I’ll suggest Close Browser, periodically, then Open Browser

How easily you can implement this, obviously depends on how your test/task is structured, if you robot is doing the same activity over and over again, just with different data, then you can structure it differently perhaps as a test template for one iteration of the test/task, then pass the data in using the data driven approach. This will also meant you can simply use pabot to run several robots in parallel
There is a trade off with opening and closing the browser too often though, opening the browser is quite disk intensive, so if your test’/task for one iteration is really short then this might not be ideal, so you may need to run the steps in a small loop of 5-10 times, then close the browser.
If the 2 hour process is a long sequence of different tasks then you’ll need to think more carefully about how and when to close and open the browser.

If you’re not sure how to break things up and can show us your robot code then we might be able to offer some suggestions.

Hopefully that helps,

Dave.

Hi,

Can still use this:

Execute Javascript    window.open('about:blank', '_blank');
${windowsID}    Get Window Handles
Switch Window    ${windowsID}[1]
Go To    ${url}

:grinning_face:
Charlie

1 Like

I guess you could easily wrap that in a Open Window user defined keyword :+1: