Faster Rendering Using Hardware Acceleration

April 28, 2021 by Benjamin Schaaf All Posts

At Sublime HQ we like to put in extra effort for performance, which is why we use a fully custom UI framework and why we wrote our own git library. But if you open a copy of Sublime Text 3 on a 4k display you may notice that it isn't quite keeping up. This is due to the CPU being used for rendering, which doesn't scale well to higher resolutions. Back in 2018 we decided to fix this performance problem using hardware accelerated rendering.

With Sublime Merge 2 and the upcoming Sublime Text 4 release we now have fully hardware accelerated rendering using OpenGL. This has enabled both applications to stay performant at resolutions as high as 8k. It has been enabled by default for macOS and can be optionally enabled on Linux and Windows under the "Advanced" section in the preferences dialog or using the "hardware_acceleration" setting. I hope to give a brief overview of how we achieved this and the choices we made along the way.

Choosing an API

Before we could start on an implementation we of course had to pick an API to use for controlling the GPU. We wanted a shared implementation for all the platforms we support, which immediately ruled out the Direct2D and Metal APIs. For flexibility and performance reasons we also didn't want to use a higher-level library like Skia, which we already make use of for CPU-based rendering. This left us with only two options: Vulkan and OpenGL.

Vulkan is the successor of OpenGL and comes with many performance advantages at the cost of some complexity. Its design simplifies the GPU drivers leading to more stable operating systems and applications. It would be our API of choice had Apple not decided against supporting it on their platforms. We did evaluate the viability of MoltenVK - a Vulkan implementation built on top of Apple's Metal API - however it doesn't support macOS 10.9 nor did it seem stable enough at the time. Unfortunately this didn't leave us any other choice than to use OpenGL.

OpenGL is 28 years old and currently the only truly cross-platform GPU API. It's supported by practically every GPU under the sun, but its complexity and multitude of implementations make the drivers more bug-prone and inconsistent. However since we only needed to render simple 2D geometry our hope was that the drivers wouldn't be much of an issue. Thankfully this also happened to be the API I was already familiar with, so getting reacquaint with its intricacies wasn't too difficult.

We also had to choose which version of OpenGL to target. We went with the latest version supported by Apple: OpenGL 4.1, as this version is relatively recent but also supported by most hardware.

Implementation

Owing to its history with DirectX, our UI framework was rather well positioned for adding hardware accelerated rendering. There was already a rendering abstraction layer in place called a "render context". Most widgets only used the basic primitives provided by the render context, though some also did rendering themselves. The plan was to start off basic on one platform (Linux), implementing the render context's functions one by one, then moving all the custom widget rendering into the render context and finally porting the rendering to the other platforms. The end goal being to reliably produce an almost identical rendering result (within rounding error).

The biggest problems we had were initially performance related. GPUs get their performance from doing work in parallel, unlike with a CPU where you can easily render small parts at a time you instead need to batch lots of small things together into a single render job. This is most apparent with text rendering where we see massive gains from batching glyphs together. This does mean that glyphs are mostly drawn out of order, which can easily result in odd rendering bugs if you're not careful. Overall the batching has some fairly complex logic behind it but most of it remained contained inside the render context. You can see below the improvement from just batching glyphs:

No Batching Batched x4 Batched x16 Batched x8192
Frame Time 52ms 17ms 8ms 3ms
Tests were done using AMD RX560 on Linux at 1440p; the time represents the full render time not just the glyphs.

Similarly many other rendering functions required slight alterations to work better with hardware accelerated rendering. Notably the gradients used for shadows, the squiggly underlines used for spell checking and text fading needed to be moved from custom render functions into the render context. Here's a demonstration of the full application being rendered:

After we had a fully working implementation for Linux we began the porting effort to macOS, which is where we encountered our first driver bug. Sadly this turned out to be a trend. To this date we've come across ~8 separate driver bugs on different platforms/hardware and have implemented various workarounds, feature blacklists or in one case an OS version blacklist. These bugs are the most frustrating part of working with OpenGL, but in the end adding these workarounds still seems simpler than having separate implementations using different APIs.

I'd like to mention RenderDoc as an invaluable tool for debugging on Linux and Windows.

End Result

The merge commit that introduced OpenGL came in at just under 9000 lines of code. After fixing a long initial wave of bugs it's been fairly stable since the release of Sublime Merge 2.

As you can see below, in its current state OpenGL rendering scales really well to higher resolutions. Even with a low-end dedicated GPU we're now faster at 4k/8k with hardware acceleration than at 1080p without, and are easily within the 16ms budget for a 60hz monitor.

Hardware 1366x768 1080p 1440p 4k 8k
Ubuntu 20.04 CPU (2990wx) 5ms 6ms 17ms
Ubuntu 20.04 AMD RX560 3ms 3ms 3ms
macOS 11.1 CPU (5250U) 5ms 12ms 30ms
macOS 11.1 Intel HD 6000 5ms 9ms 18ms
Windows 10 CPU (9900k) 7ms 21ms
Windows 10 2080ti 3ms 3ms

Future Considerations

The current implementation still leaves a fair amount of performance on the table, especially for non-dedicated GPUs. Although it's not strictly required it would be nice to do further optimizations for battery usage and older devices.

Apple deprecating OpenGL and improvements to MoltenVK make it clear that Vulkan support will at some point need to be added, though it's unclear how far away that is.

With upcoming Linux ARM support OpenGL is more important than ever due to the low-power CPUs in those kinds of devices. They also generally don't support recent versions of OpenGL, so the version requirement may need to be lowered in the future.