0.0 The GPU Driver
After graduating from school, I landed my first job as a system software engineer working on OpenGL, a GPU driver. However, my understanding of driver programs was quite limited at the time. In college, GPU and driver software were not a focus of computer science studies, so all I knew was that a driver is a piece of software that acts as a bridge between hardware and applications. I had no idea how it worked or what kind of data it transferred to the hardware. It wasn't until I joined the team and started working on the project that I gained a deeper understanding of how GPU drivers operate and their crucial role in the graphics rendering process.
As graphics engineers, we write programs to draw pixels on screens. Although drawing may seem instantaneous, there is a lot of activity happening behind the scenes. To properly understand and utilize the WebGPU API, it is important to have a basic understanding of how GPU drivers and the rendering pipeline work. GPU drivers act as intermediaries between graphics applications, such as video games, the operating system, and the GPU hardware. Knowing the role of GPU drivers and the rendering pipeline helps us better understand and utilize the graphics APIs designed to control GPU hardware.
A GPU driver is not a single program but rather a package of several programs that serve different functions. Some handle video compression and decompression, while others perform general computing tasks. However, the core function of a GPU driver is to enable real-time 3D graphics rendering.
It may be confusing that these GPU drivers are sometimes also referred to as graphics specifications. This is because behind the implementations, there are API specifications that have been agreed upon by many companies. For example, the DirectX standard, which is the API of choice for the majority of games on Windows platforms, is led by Microsoft. OpenGL, on the other hand, is defined by the Khronos Group, a consortium of representatives from major companies involved in 3D graphics. The implementations of these APIs are typically the responsibility of GPU vendors such as NVIDIA, AMD, and Intel, and are released and installed on our computers in the form of drivers.
For many years, OpenGL and DirectX were the primary choices for real-time 3D graphics. However, in recent years, new APIs such as Vulkan, DirectX 12 on PC, and Metal on Mac have emerged. DirectX 12, although it feels like a successor to the DirectX family, is a completely new design that has diverged significantly from DirectX 11, making it a distinct API.
The emergence of new graphics standards reflects a shift in the mindset of API design. Older generation APIs strived to be generic and provide heavy lifting for all use cases, but as a result, they became slow and cumbersome. In contrast, newer APIs are designed to be lightweight and put more responsibility on developers for fine-tuning and performance optimization. While this makes graphics development using new APIs more challenging, it also allows for greater control and improved performance. Development efficiency and ergonomics are now supported by high-level graphics middleware, such as game engines, that can help with specific use cases. For instance, if you are developing a video game, it is generally not recommended to start from scratch using a low-level API, but rather to choose a suitable game engine.
The development of 3D graphics for the web has followed a similar path to other graphics technologies. WebGL and WebGL2 were the first web 3D APIs, sharing their roots with OpenGL ES 2 and 3. They were simplified for efficiency on less capable devices and dropped the requirement for backward compatibility. Unlike other graphics APIs, these web APIs are not implemented by hardware vendors in drivers but are instead provided by web browsers. The web browser translates these APIs to one of the native APIs. For example, in Chrome, there is a subsystem called ANGLE that implements the WebGL APIs. On Windows, the implementation is realized via DirectX.
Until very recently, WebGL and WebGL2 have been the most widely used 3D graphics APIs for the web. However, WebGPU, the web counterpart of the new generation of 3D APIs that follows the same principles of lightweight design, is set to replace WebGL in the near future. As a result, now is an excellent time to start learning this new standard.
Now that we have a broad understanding of the landscape of graphics APIs, what exactly does a graphics driver do?
As we know, an operating system runs in two modes: kernel mode and user mode. Kernel mode handles the most essential functions required for the operating system to operate. In contrast, user mode is less critical, and most applications run in this mode. There is only one operating system kernel running, which serves many other applications operating in user mode.
When it comes to GPU drivers, the architecture is similar. Some GPU drivers run in kernel mode, where they are responsible for scheduling draw commands to the GPU hardware, allocating or sending resources, and performing synchronization between the CPU and the GPU. There is only one instance of the kernel mode GPU driver. Other components, implemented as dynamically linked libraries, run in user mode as part of an application, providing the high-level interfaces for graphics functions, with OpenGL and DirectX being the most famous ones. We will dive deeper into the architecture of GPU drivers in a later section.
Now let's look at how user mode drivers and kernel mode drivers work together with the operating system to enable the graphics subsystem to function properly.
In a running system, every application that needs to perform graphics tasks has its own instance of the user mode driver loaded. Hence, there are multiple instances of the user mode drivers. In contrast, the kernel mode driver is loaded only once at boot time and is shared by the entire system, including both user applications and system components such as the desktop.
Graphics drivers serve three important roles: First, they act as real-time compilers, translating developers' API calls into machine code that the GPU hardware can understand. Second, they serve as resource managers, allocating and releasing GPU memory as needed. Finally, they act as schedulers, sending work to the GPU and providing the necessary infrastructure for synchronization between the CPU and GPU. Let's look at each role in detail.
You may wonder why a graphics application needs a real-time compiler when a CPU program does not seem to require one. This is because:
There are no standard instruction sets for GPU hardware, unlike on the CPU side where both x86 and ARM have open and standardized instruction sets. If a program is built for a specific CPU architecture, it should run on all CPUs of the same architecture, regardless of the hardware manufacturer. On the GPU side, however, NVIDIA's instructions are not compatible with AMD's, so the machine code must be generated based on the specific hardware in use.
The primary function of GPU hardware is rendering, but the exact rendering content cannot be determined at compile time. For example, consider an app that displays a clock. The current time determines what is displayed on the user interface, and based on this time, the CPU code generates a series of GPU drawing commands, which are then sent to the GPU for rendering. Since the GPU drawing commands are determined by the current time, they must be generated on the fly during execution.
On the resource management side, a GPU driver is responsible for allocating resources in GPU memory. These resources can include textures, buffers containing geometry, or shader programs. Shader programs run on the GPU and are responsible for outputting pixels to the screen. Writing shader programs is a key aspect of GPU programming, and we will explore this topic in more detail in later chapters.
Because GPU memory is limited, the driver must manage it efficiently. For example, it may temporarily swap unused GPU resources into CPU memory to prioritize other resource allocations, allowing the system to run more graphics applications than the raw GPU memory would otherwise permit. Additionally, the data transfer speed between CPU and GPU memory is relatively slow, so the driver must be optimized to reduce data transfers and maximize performance.
The third role of a GPU driver is as a job scheduler and synchronization infrastructure.
All user applications send their draw commands and resource upload requests to the kernel mode driver. In this context, "uploading" refers to copying data from CPU memory to GPU memory, while the opposite process, copying data from GPU memory to CPU memory, is referred to as "downloading."
The kernel mode driver is responsible for sending these requests to the GPU hardware for execution and notifying the system when a request is complete. This process is similar to the JavaScript event loop, with each request acting like an asynchronous function call.
However, scheduling is not as simple as it may seem. Since the kernel mode driver is shared by the entire system, the workload must be managed carefully and fairly. If the GPU becomes stuck on a long-running job, it can cause the whole system to hang, much like how a long-running JavaScript function can freeze a web browser.
In addition to its other responsibilities, a GPU driver also helps with synchronization between the CPU and GPU to avoid data races. For example, if we are rendering a 3D scene and want to save the rendered result as an image, we need the driver to notify us when the rendering is complete before we save it.
As a concrete example, when a graphics application, such as a game, is launched, it requests the operating system to load the appropriate user mode driver and begins to interact with the GPU by calling the functions provided by the graphics API. The video game needs to allocate resources, such as buffers and textures, via the APIs and configure the behavior of the rendering by defining shader programs. During rendering, the game sends geometry data through the API for rendering. The user mode driver translates these function calls into low-level commands and data that the GPU hardware can understand and sends them to the kernel mode driver for job scheduling via system calls. The kernel mode driver responds to all GPU-related requests from the entire system, queues up all these requests for execution, and ensures the queuing strategy is fair and efficient for the entire system.
In summary, this section has provided an overview of the landscape of graphics APIs, the architecture of GPU drivers, and how user mode drivers, operating systems, and kernel mode drivers work together to control the GPU hardware. We have also discussed the three main roles that a GPU driver performs. However, the driver is just an assistant to the GPU, and the actual rendering is carried out by the hardware using a concept called the GPU pipeline. In the next section, we will explore this pipeline in greater detail, as it forms the basis of the design of graphics APIs.