Larian Banner: Baldur's Gate Patch 9
Previous Thread
Next Thread
Print Thread
Joined: Oct 2020
stranger
OP Offline
stranger
Joined: Oct 2020
Q: Is it possible to switch from HDR10 (R10G10B10A2 UNORM) to scRGB (R16G16B16A16 FP) or at least make this an option?

There are a number of UI effects in HDR-mode that are not being blended properly (banding and color shifting) due to lossy colorspace transformation and 2-bit destination alpha.

Background Info:

Many "HDR-as-an-after-thought" engines run into this problem by rendering the UI in a separate pipeline with output done to a traditional 8-bpc RenderTarget (critically, with more than 2-bits of alpha).

This either means transforming the game's HDR content (Rec 2020 w/ PQ gamma) to Rec 709 / Linear for alpha blending in the UI pipeline (HINT: tons of information is lost for anything that is not opaque), or transforming the UI output to Rec 2020 + PQ gamma and compositing the UI in HDR-space with only 2-bits of alpha precision in the HDR pipeline. It's ugly, complicated, and neither approach produces great results unless the UI is always opaque.

There's a better option that will not harm the image quality of your UI in HDR:

scRGB (FP16)

Rendering in scRGB is great because it

  • future-proofs your render pipeline (when displays w/ more than 10-bits precision enter the market, your engine automagically benefits)
  • uses Rec 709 color primaries, so color grading is no more complicated than SDR rendering
  • is linear for both RGB and Alpha (HDR10 is only linear in alpha).


Any part of your render pipeline that was never specifically tooled for HDR10's wacky PQ (SMPTE 2084) curve or absurdly wide color gamut comes out looking as expected (no gamma-induced banding, no oversaturation, correct alpha transparency) when the swapchain is FP16 and using scRGB colorspace.

Realistically... if your engine is running in HDR, your user almost certainly has VRAM / memory bandwidth to spare and I think most drawbacks to using FP16 rather than 10:10:10:2 are negligible.

Contemporary AMD / NV GPUs have lossless framebuffer color compression that significantly closes the gap in terms of memory bandwidth requirements. The only hit you take for FP16 HDR is 4x VRAM requirements for storage (vs. 10:10:10:2) and you do not need the full FP16 treatment for all intermediate render passes (at minimum, the only surface that requires 16-bit per-color is the swpachain's backbuffer). That said, going beyond HDR10 precision in the intermediate passes goes a long way toward killing banding (which you have a ton of!!); there are diminishing returns though, so this would be a great place to add a performance / quality option for the user.


P.S. If you want to extend HDR to work in Vulkan, the consensus among all of the AAA games I have reverse engineered to date is to use a DXGI SwapChain + D3D11 Immediate Context to perform Vk / DXGI interop (copy the Vk swpachain's backbuffer to DXGI for Presentation).

It sounds weird, but if you match up the SwapChain image format and dimensions there is actually no overhead introduced by doing this (under the hood it's just a shared surface regardless which API is touching it). Just make sure you have cleanly released all references to said buffer after Present -- there's one implicitly held by the DWM until VBLANK and the D3D11 Immediate Context may hold one unless you call ClearState (...) and Flush (...) prior to Present. You get finer grained control over pretty much everything (latency, scaling, colorspace, G-Sync, ...) if you use IDXGISwapChain*::Present (...) to present final frames for scanout, so this interop is a blessing in disguise for HDR users.


I am available for technical discussion and would love to hear back from an engine dev. on this if there are any questions; the best way to reach me is by creating a forum account on https://special-k.info and private messaging Kaldaien.

Joined: Oct 2020
stranger
OP Offline
stranger
Joined: Oct 2020
Also, on the topic of SwapChains... your choice of DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL is curious. That produces sub-par performance, because it requires the swapchain to preserve the contents of old frames even after Present (...). The memory contents are preserved until the swapchain sequence (e.g. 0,1,2, 0,1,2, ...) reuses the index (see IDXGISwapChain3::GetCurrentBackBufferIndex (...)).

DXGI_SWAP_EFFECT_FLIP_DISCARD is quicker, it tells the presentation layer you do not care what happens to the buffer's memory after it has been scanned-out. Prefer Discard on Windows 10 machines for better performance.


Link Copied to Clipboard
Powered by UBB.threads™ PHP Forum Software 7.7.5