Reducing Memory Footprint in Real-Time Ray Tracing with NVIDIA RTXMU
Summary
Real-time ray tracing has revolutionized the way we experience lighting in video games, but it comes with a significant computational cost. To address this, NVIDIA developed the RTX Memory Utility (RTXMU), an open-source SDK that combines compaction and suballocation techniques to optimize and reduce memory consumption of acceleration structures. This article delves into the details of RTXMU, explaining how it works and how it can benefit developers.
Understanding Acceleration Structures
Acceleration structures are crucial for efficient ray tracing. They spatially organize geometry to accelerate ray tracing traversal performance. However, when creating an acceleration structure, a conservative memory size is allocated, which can lead to wasted memory.
The Role of Compaction and Suballocation
Compaction
Compaction is a process that reduces the memory footprint of acceleration structures by eliminating unused memory segments. After the initial build, the graphics runtime reports back the smallest memory allocation that the acceleration structure can fit into. This process is essential for reducing memory overhead.
Suballocation
Suballocation enables acceleration structures to be tightly packed together in memory by using a smaller memory alignment than is required by the graphics API. Typically, buffer allocation alignment is at a minimum of 64 KB, while the acceleration structure memory alignment requirement is only 256 B. This technique is particularly beneficial for games with many small acceleration structures.
How RTXMU Works
RTXMU is designed to reduce the coding complexity associated with optimal memory management of acceleration structures. It provides compaction and suballocation solutions for both DXR and Vulkan Ray Tracing, while the client manages synchronization and execution of acceleration structure building.
- Compaction: RTXMU requests the compaction size to be written out to a chunk of video memory. After the compaction size has been copied from video memory to system memory, RTXMU allocates a suballocated compaction buffer to be used as the destination for the compaction copy.
- Suballocation: RTXMU uses a suballocator to place small acceleration structure allocations within a larger memory heap, fulfilling the 256 B alignment requirement.
Benefits of Using RTXMU
- Reduces Memory Footprint: RTXMU can reduce the memory footprint of acceleration structures by up to 50%.
- Simplifies Memory Management: RTXMU abstracts away memory management of bottom-level acceleration structures (BLASes) and manages all barriers required for compaction size readback and compaction copies.
- Prevents Mismanagement: RTXMU uses handle indirection to BLAS data structures to prevent any mismanagement of CPU memory.
- Less TLB Misses: Suballocation gives the benefit of less Translation Lookaside Buffer (TLB) misses by packing more BLASes into 64 KB or 4 MB pages.
Integration Results
On average, compaction on NVIDIA RTX cards reduced acceleration structure memory by 52% for test scenes. The standard deviation of compaction memory reduction was 2.8%, indicating stable performance.
Using RTXMU
RTXMU is easy to integrate and provides immediate benefits. It reduces the time it takes for developers to integrate compaction and suballocation into an RTX title. Here’s a simplified example of how to use RTXMU:
// Grab RTXMU singleton
rtxmu::DxAccelStructManager rtxMemUtil(device);
// Initialize suballocator blocks to 8 MB
rtxMemUtil.Initialize(8388608);
// Batch up all the acceleration structure build inputs
std::vector<D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS> bottomLevelBuildDescs;
// Populate build command list and get acceleration structure IDs
std::vector<uint64_t> accelStructIds;
rtxMemUtil.PopulateBuildCommandList(commandList.Get(), bottomLevelBuildDescs.data(), bottomLevelBuildDescs.size(), accelStructIds);
// Perform compaction and garbage collection
rtxMemUtil.PopulateCompactionCommandList(commandList.Get(), accelStructIds);
rtxMemUtil.GarbageCollection(accelStructIds);
Table: Comparison of Memory Reduction Techniques
Technique | Memory Reduction |
---|---|
Compaction | Up to 50% |
Suballocation | Variable, depends on structure size and alignment |
Combined (RTXMU) | Up to 52% on average |
Table: RTXMU Integration Steps
Step | Description |
---|---|
Initialize RTXMU | Initialize the RTXMU singleton with the device. |
Set Up Suballocator | Initialize suballocator blocks to a specified size. |
Batch Build Inputs | Batch up all the acceleration structure build inputs. |
Populate Build Command List | Populate the build command list and get acceleration structure IDs. |
Perform Compaction | Perform compaction and get the compacted acceleration structure. |
Garbage Collection | Deallocate unused build resources and remove acceleration structures. |
Conclusion
NVIDIA RTXMU is a powerful tool for optimizing memory consumption in real-time ray tracing applications. By combining compaction and suballocation techniques, RTXMU significantly reduces acceleration structure memory, enabling developers to add more geometry to their scenes or use the extra memory for other resources. With its ease of integration and immediate benefits, RTXMU is a must-have for any developer looking to enhance their ray-traced games and applications.