Opened at 2026-05-12T01:03:49Z
Last modified at 2026-05-13T00:37:17Z
#1429 new planned
control effort for global buffer allocator
| Reported by: | Ichthyostega | Owned by: | |
|---|---|---|---|
| Priority: | lesser | Milestone: | 2beta |
| Component: | lumiera | Keywords: | render performance tuning design |
| Sub Tickets: | #1396, #1426 | Parent Tickets: | #835, #966, #1315, #1387 |
Description
As Lumiera Architect,
I want to control the effort for managing buffer memory,
to strike a good balance between avoiding excess allocations and blocking render workers.
description of the problem
For the BufferProvider implementation used in the actual Render Engine, a set of thread-local storage pools is used. The intention is to calibrate this system in such a way that most of the buffer allocation requests can be serviced from a small pool of allocations already associated with the current worker thread. Yet those local memory managers should not perform actual allocations on their own, since doing so incurs the risk of global contention.
Requests for new allocations, as well as old allocation no longer needed in the local pool are thus sent over a lock-free queue towards a central service, the Engine Buffer Manager. And while this setup removes the effort for maintaining a global pool of buffer allocations from the worker threads, in the hope that they have something else to do meanwhile — a new problem of control and coordination is created, touching on various tricky questions:
- a worker first announces the new memory requirements, based on a global pre-computation at the start of a render job
- but how quickly are these allocation actually required on average? If an allocation does not arrive in time, the
LocalBufferStorewill issue a synchronous blocking request at the point where the memory is required. Since we can not afford to keep track of each request individually, there is no way to cancel out the asynchronous request at that point, so that we'll end up with a duplicated allocation sitting in the local pool, until that overhead is detected eventually at the next clean-up step. - furthermore, any excess allocation will be sent back from the local pool to the central
EngineBufferMemory, creating yet some further effort to place those allocations back into the appropriate pool and possibly even readjust the pool size. - all this additional work must be performed anyway, and doing so requires to push aside the actual render processing related to some worker; yet the difficult question is how this can be achieved
- we could perform this internal management work always directly from the worker, which in fact implies not to use asynchronous hand-over, but to use a global lock rather.
- we could create a background thread to wake up periodically and handle those tasks; such a setup is probably the simplest and clearest solution, but it implies that this management thread will regularly compete with some worker for CPU time.
- we could schedule dedicated management jobs whenever a worker issues an allocation request; such management jobs would take precedence over the next render job, thereby avoiding the resource contention since they would be handled at a point where some larger render job has been completed — but the downside is that scheduling a job incurs some overhead, which might be larger than the actual time spent with memory management and clean-up
how to decide
It might not be possible to settle upon a single »right way« of dealing with those problems. So first and foremost we thus need a way to find out about the overall effort spent on those tasks, and we need to see how to balance throughput against unused excess allocations. Depending on the results from these observations, we might conclude that the effort for memory management is negligible, or it might incur such a substantial overhead that it has to be factored into the overall load management, or we might even end up implementing a dynamic control...

how to do the Locking?
This is another, quite related aspect.
I can see two contrariant approaches:
There are good arguments in favour for each of these opposing preferences, and a solid decision must be based on empirical findings. However, not being able to conduct such an investigation in the current stage, I decide based on gut feeling to favour the second balancing, since I'm under the impression that our job sizes are comparatively large and overall this environment exhibits not much contention, so that it is preferable to have one single, focused processing, and to arrange the locking such that it's correctness is obvious.