When running long-running UI scraping tasks (around ~2 hours), we observe a gradual increase in memory usage on the server, eventually spiking to ~5 GB. What are the best practices to diagnose the root cause and optimize memory consumption for this type of automation? Specifically, what steps (profiling/monitoring, coding or design changes, batching/checkpointing, cleanup/garbage collection, etc.) would you recommend to prevent memory growth over time?
Note: I have done garbage collection at each point wherever it’s possible.
A couple of pieces of information that will help us give you a better answer:
which library are you using to do the UI scraping tasks? (based on your comment about garbage collection I might guess Selenium but not sure)
Are you seeing the high memory usage on the application server of the application your scraping or the machine running the robot task (task or test?)
Which OS are you seeing the high memory usage on?
Knowing that will help us narrow down the suggestions.
As a general rule of thumb ending sessions can help release memory, so maybe structure your robot differently to reduce the session time of your scraping user, that said 2hrs is not particularly long for a session time on a client server desktop app, but could be very long for a web based app.
Also does the scraping task have to be done by a single user? could you have 2 or 4 robots running in parallel to do the scraping task and potentially reduce the time the scraping takes?