Summary

This article explores the concept of dynamic control flow in CUDA Graphs using conditional nodes. CUDA Graphs are a powerful tool for optimizing GPU workflows, but they often require breaking up tasks to return control to the CPU for decision-making. Conditional nodes, introduced in CUDA 12.4 and expanded in CUDA 12.8, allow for conditional or repeated execution of graph portions without CPU intervention. This feature enhances performance by freeing up CPU resources and enabling more complex workflows within a single graph.

Dynamic Control Flow in CUDA Graphs: A Game-Changer for GPU Workflows

The Challenge with Traditional CUDA Graphs

CUDA Graphs offer significant performance benefits for static workflows by allowing the driver to optimize execution based on a complete description of tasks and dependencies. However, most problems involve some form of decision-making, which traditionally requires splitting up graphs and returning control to the CPU. This approach compromises CUDA’s ability to make optimizations, ties up CPU resources, and adds overhead with each graph launch.

Introducing Conditional Nodes

Conditional nodes, supported in CUDA 12.4 and further enhanced in CUDA 12.8, address this challenge by enabling the conditional or repeated execution of graph portions without returning control to the CPU. These nodes are container nodes, similar to child graph nodes, but their execution depends on the value of a condition variable accessed through a handle created prior to the node.

Types of Conditional Nodes

IF Nodes

IF nodes execute their body graph once if the condition value is true. Starting with CUDA 12.8, IF nodes can also support an optional second graph, which is executed if the condition value is false, effectively functioning as an IF/ELSE node.

WHILE Nodes

WHILE nodes execute their body graph repeatedly as long as the condition value is true. The condition is evaluated when the node is executed and after each completion of the body graph.

SWITCH Nodes

SWITCH nodes, introduced in CUDA 12.8, execute one of n different graphs within the conditional node based on the condition value. If the condition value is greater than or equal to n, no graph is executed.

Creating Conditional Nodes

Creating a conditional node involves several steps:

  1. Allocating the Conditional Handle: A conditional handle must be created using cudaGraphConditionalHandleCreate. This handle is used to access the condition value.
  2. Setting the Condition Value: The condition value can be set in a CUDA kernel by calling cudaGraphSetConditional.
  3. Creating the Conditional Node: The conditional node is created using cudaGraphAddNode with the appropriate node parameters, including the handle and the type of conditional node.
  4. Populating the Body Graph: The body graph of the conditional node can be populated using either the graph API or by capturing asynchronous CUDA calls using cudaStreamBeginCaptureToGraph.

Example: Creating an IF Node

The following example demonstrates how to create an IF node:

__global__ void setHandle(cudaGraphConditionalHandle handle) {
    unsigned int value = 0;
    // Perform some work and set value based on the result
    if (someCondition) {
        value = 1;
    }
    cudaGraphSetConditional(handle, value);
}

cudaGraph_t createGraph() {
    cudaGraph_t graph;
    cudaGraphNode_t node;
    void *kernelArgs;
    cudaGraphCreate(&graph, 0);
    cudaGraphConditionalHandle handle;
    cudaGraphConditionalHandleCreate(&handle, graph);

    // Use a kernel upstream of the conditional to set the handle value
    cudaGraphNodeParams kParams = { cudaGraphNodeTypeKernel };
    kParams.kernel.func = (void *)setHandle;
    kParams.kernel.gridDim.x = kParams.kernel.gridDim.y = kParams.kernel.gridDim.z = 1;
    kParams.kernel.blockDim.x = kParams.kernel.blockDim.y = kParams.kernel.blockDim.z = 1;
    kParams.kernel.kernelParams = kernelArgs;
    kernelArgs = &handle;
    cudaGraphAddNode(&node, graph, NULL, 0, &kParams);

    cudaGraphNodeParams cParams = { cudaGraphNodeTypeConditional };
    cParams.conditional.handle = handle;
    cParams.conditional.type = cudaGraphCondTypeIf;
    cParams.conditional.size = 1;
    cudaGraphAddNode(&node, graph, &node, 1, &cParams);

    cudaGraph_t bodyGraph = cParams.conditional.phGraph_out;
    // Populate the body of the conditional node
    cudaGraphNode_t bodyNodes;
    cudaGraphNodeParams params = { ... }; // Setup kernel parameters as needed.
    cudaGraphAddNode(&bodyNodes, bodyGraph, NULL, 0, &params);
    cudaGraphAddNode(&bodyNodes, bodyGraph, &bodyNodes, 1, &params);
    cudaGraphAddNode(&bodyNodes, bodyGraph, &bodyNodes, 1, &params);
    cudaGraphAddNode(&bodyNodes, bodyGraph, &bodyNodes, 2, &params);

    return graph;
}

Conclusion

Conditional nodes in CUDA Graphs revolutionize GPU workflows by enabling dynamic control flow without CPU intervention. This feature significantly enhances performance by freeing up CPU resources and allowing for more complex workflows within a single graph. With the introduction of IF, WHILE, and SWITCH nodes, developers can now create more sophisticated and efficient GPU applications. For more information and complete examples, visit the CUDA samples repository and engage with the NVIDIA Developer CUDA forums.