Reducing Acceleration Structure Memory with NVIDIA RTXMU

Reducing Memory Footprint in Real-Time Ray Tracing with NVIDIA RTXMU

Summary

Real-time ray tracing has revolutionized the way we experience lighting in video games, but it comes with a significant computational cost. To address this, NVIDIA developed the RTX Memory Utility (RTXMU), an open-source SDK that combines compaction and suballocation techniques to optimize and reduce memory consumption of acceleration structures. This article delves into the details of RTXMU, explaining how it works and how it can benefit developers.

Understanding Acceleration Structures

Acceleration structures are crucial for efficient ray tracing. They spatially organize geometry to accelerate ray tracing traversal performance. However, when creating an acceleration structure, a conservative memory size is allocated, which can lead to wasted memory.

The Role of Compaction and Suballocation

Compaction

Compaction is a process that reduces the memory footprint of acceleration structures by eliminating unused memory segments. After the initial build, the graphics runtime reports back the smallest memory allocation that the acceleration structure can fit into. This process is essential for reducing memory overhead.

Suballocation

Suballocation enables acceleration structures to be tightly packed together in memory by using a smaller memory alignment than is required by the graphics API. Typically, buffer allocation alignment is at a minimum of 64 KB, while the acceleration structure memory alignment requirement is only 256 B. This technique is particularly beneficial for games with many small acceleration structures.

How RTXMU Works

RTXMU is designed to reduce the coding complexity associated with optimal memory management of acceleration structures. It provides compaction and suballocation solutions for both DXR and Vulkan Ray Tracing, while the client manages synchronization and execution of acceleration structure building.

Compaction: RTXMU requests the compaction size to be written out to a chunk of video memory. After the compaction size has been copied from video memory to system memory, RTXMU allocates a suballocated compaction buffer to be used as the destination for the compaction copy.
Suballocation: RTXMU uses a suballocator to place small acceleration structure allocations within a larger memory heap, fulfilling the 256 B alignment requirement.

Benefits of Using RTXMU

Reduces Memory Footprint: RTXMU can reduce the memory footprint of acceleration structures by up to 50%.
Simplifies Memory Management: RTXMU abstracts away memory management of bottom-level acceleration structures (BLASes) and manages all barriers required for compaction size readback and compaction copies.
Prevents Mismanagement: RTXMU uses handle indirection to BLAS data structures to prevent any mismanagement of CPU memory.
Less TLB Misses: Suballocation gives the benefit of less Translation Lookaside Buffer (TLB) misses by packing more BLASes into 64 KB or 4 MB pages.

Integration Results

On average, compaction on NVIDIA RTX cards reduced acceleration structure memory by 52% for test scenes. The standard deviation of compaction memory reduction was 2.8%, indicating stable performance.

Using RTXMU

RTXMU is easy to integrate and provides immediate benefits. It reduces the time it takes for developers to integrate compaction and suballocation into an RTX title. Here’s a simplified example of how to use RTXMU:

// Grab RTXMU singleton
rtxmu::DxAccelStructManager rtxMemUtil(device);

// Initialize suballocator blocks to 8 MB
rtxMemUtil.Initialize(8388608);

// Batch up all the acceleration structure build inputs
std::vector<D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS> bottomLevelBuildDescs;

// Populate build command list and get acceleration structure IDs
std::vector<uint64_t> accelStructIds;
rtxMemUtil.PopulateBuildCommandList(commandList.Get(), bottomLevelBuildDescs.data(), bottomLevelBuildDescs.size(), accelStructIds);

// Perform compaction and garbage collection
rtxMemUtil.PopulateCompactionCommandList(commandList.Get(), accelStructIds);
rtxMemUtil.GarbageCollection(accelStructIds);

Table: Comparison of Memory Reduction Techniques

Technique	Memory Reduction
Compaction	Up to 50%
Suballocation	Variable, depends on structure size and alignment
Combined (RTXMU)	Up to 52% on average

Table: RTXMU Integration Steps

Step	Description
Initialize RTXMU	Initialize the RTXMU singleton with the device.
Set Up Suballocator	Initialize suballocator blocks to a specified size.
Batch Build Inputs	Batch up all the acceleration structure build inputs.
Populate Build Command List	Populate the build command list and get acceleration structure IDs.
Perform Compaction	Perform compaction and get the compacted acceleration structure.
Garbage Collection	Deallocate unused build resources and remove acceleration structures.

Conclusion

NVIDIA RTXMU is a powerful tool for optimizing memory consumption in real-time ray tracing applications. By combining compaction and suballocation techniques, RTXMU significantly reduces acceleration structure memory, enabling developers to add more geometry to their scenes or use the extra memory for other resources. With its ease of integration and immediate benefits, RTXMU is a must-have for any developer looking to enhance their ray-traced games and applications.

Reducing Memory Footprint in Real-Time Ray Tracing with NVIDIA RTXMU#

Summary#

Understanding Acceleration Structures#

The Role of Compaction and Suballocation#

Compaction#

Suballocation#

How RTXMU Works#

Benefits of Using RTXMU#

Integration Results#

Using RTXMU#

Table: Comparison of Memory Reduction Techniques#

Table: RTXMU Integration Steps#

Conclusion#