I've been working on this technique some year or two ago and haven't touched it for a while, but since AMD recently published their MLAA source code I was curious to see how they compare, and the results seem interesting enough. My implementation is, at the moment, twice as fast as the AMD's one in higher resolutions (eg, 1920x1080), or on slower GPUs (check the profiling results so far: SMLAA_benchmarks.txt and feel free to send me yours).
However, the main goal of my 'Symmetrical MLAA' wasn't performance but to be minimally invasive: to reduce damage to the original image as much as possible and to always err on the side of caution. It is based on the same idea as the MLAA but with a slightly different approach (could also be called a 'Restrained MLAA' or 'Restricted MLAA' - but I like 'Symmetrical' more because of the shape used, read on).
Even if you don't like the idea of a 'restrained' MLAA (maybe you like your AA extra-strong!), my demo should also be an interesting example on how to improve the performance of the existing AMD MLAA (or a similar technique).
Check the full description of the technique below, and here's the demo with the source code (I've just added my technique to AMD's Demo and added a couple of things like ability to use any image as input instead of just the demo 3D scene):
Download SMLAA_0.9.7z : check the new, updated version at the end of the post
(I'll add more in the future)
How SMLAA works
Unlike in the original MLAA [Reshetov 2009] approach, I don't use the L shape as the basis of the edge processing. This is because it does not guarantee preservation of the average amount of color in the image and will, in some cases, slightly enlarge one surface (background or foreground) at the expense of the other or incorrectly blur unwanted areas ('synthetic test' screenshots above demonstrate this).
Instead, I use the Z shape (image left/above shows all possible Z shape variations that are looked for) as the basis of the detection and handling of aliasing artifacts (it should be noted that this is not the same as the Z shape from the [Reshetov 2009]). This shape, being symmetric with regards to the edge, has some advantages when compared to the L shape:
- It preserves the average image color - the amount of color blurred from the one side of the line always equals to the amount blurred from the other and it does not significantly blur the ’lone’ one pixel wide lines that end with an edge.
- Is more stable with regards to dynamic changes, as it is symmetric and the line lengths used are shorter. (The line length changes commonly occur due to movement-induced shape changes, or the edge being intersected by another scene object.)
It is a multi-pass effect and a description each step follows. Note, this is a general description of the algorithm, and is slightly different from the actual implementation; in the demo some of the presented steps require multiple render passes, or are reordered for performance reasons.
Step 1: Edge detection
For this demo the edge detection is modified to be the same as in AMD's MLAA demo: based on the render/screen alpha value that stores the previously generated luminance of the texture (so they do exactly the same edge detection for comparison reasons). However, since in the real game engine this probably would not be the way to do things (requires an extra channel, AA is often caused by lighting differences so just diffuse texture color wouldn't cut it, etc) I'll write a follow-up on this in the future, explaining the edge detection in my original SMLAA implementation which is based on per-channel color difference and (optionally) more expensive depth (and normal, if available) comparisons, extracting as much quality as possible.
The only main difference from the AMD's demo is that this steps creates the texture containing edges used later and, at the same time, creates the stencil mask used to optimize some of the following steps. Therefore the edge detection cost is payed once, so the edge detection algorithm can be more expensive (i.e. of higher quality) and the generated info later reused in other steps.
Step 2: Z shape detection and measuring line lengths for Z-s
The only complex shape that we are concerned with is the Z shape and its four rotations. This shape represents the point at which the rasterizer, rendering a triangle edge (or a line), ’steps’ from one pixel to the next on the ’slower’ axis. I found that the best compromise for detecting this type of aliasing shape, and excluding false positives, is the algorithm that requires certain edges to exist, and certain edges to be absent from the pixel neighborhood (for more details, see enclosed shader file, sorry but it's a bit complicated to explain here). This detection algorithm is simple enough to be executed in a pixel shader and uses previously created edges texture (which is in the DXGI_FORMAT_R8_ format, thus small and requiring little bandwidth) as the only input and benefits from the stencil mask.
In this DirectX11 demo, pixel shader pass is used to detect Z shapes, which are then stored in AppendStructuredBuffer and subsequently processed in a number of compute shader steps, resulting in a Z-shape blur map buffer. This could be implemented in DirectX10.1 as well but then it becomes slightly more complicated.
Then, to be able to correctly blur the Z shape edge, we need to know the length of the rasterized line segment to the left and right (or up and down) from each shape centre. This can be determined in a number of ways: using a specific compute shader which iterates over all Z shapes (this demo, using a ConsumeStructuredBuffer); using a recursive doubling technique as presented in [Hensley et al. 2005] on older (Shader Model 3) hardware; directly on the CPU if the hardware architecture allows for it. The maximum line length can be limited to a certain value (bigger value - better smoothing of long near-horizontal and near-vertical edges but more expensive; AMD's demo uses max line length of 16 (I think), mine is limited at 24 at the moment but that's easy to tweak by modifying MAX_BLEND_LINE_LENGTH in the SMLAA.hlsl).
All this data is saved into a 'blur map' (also DXGI_FORMAT_R8_ format, check out the SMLAA demo's 'blurmap' display modes).
Step 3: Simple shapes
In addition to Z shapes, edges not covered by them are additionally antialiased using a simple 3x3 smart blur filter that applies blur to a pixel if it is surrounded by two or more edges. Since this step can cause change in the average image color by reducing the intensity of ’lone’ pixels caused by alpha-tested drawing, odd lighting cases, particles or other noisy effects, the amount of blurring is relatively small and can be tuned based on the scenario.
All data is again added to the 'blur map' (in the actual implementation this step is combined into the step 4).
Step 4: Apply the blurmaps
Blurmaps are then applied to the main render target containing the image being processed. In the demo, final blur areas are copied into a temporary texture (as this step requires reading from the main render target and writing to it, and this isn't supported, at least on PCs) and then applied onto the main render target. This step is, again, greately optimized using the stencil mask generated during the edge detection step.
1.) It should be fairly easy to optimize AMD's demo using the stencil masking as well.
2.) My original SMLAA algorithm uses adaptive quality system that drops/increases edge detection thresholds, aiming to keep the number of edges in a 'sane' range. This prevents the unusually noisy image/rendering from costing too much and also increases quality across various lighting and other scenarios.
3.) In this example/demo, the only modification of the original AMD's algorithm was to switch to using a _SRGB offscreen texture instead of the linear one; this fixes precision-induced banding issues that are not obvious in the demo (unless you look very hard in the dark area below the tank and toggle the effect on/off) but are obvious if using a darker image - I noticed it when using other input images.
To make this real obvious in the original demo either compare the OFF/ON (Show MLAA checkbox) screenshots in photoshop or make the image darker (for example, by adding "Output.Diffuse.rgb *= 0.1;" line to the Scene.hlsl after line 67).
This _SRGB mod also fixes the blending issue, which the original demo has, that overbrightens AA areas (although I'm not sure it's still 100% correct, need to verify all this).
: I've updated the code to slightly improve performance and, more importantly, fixed a bug that caused the effect to not be applied in certain cases (such as some thin 1-pixel lines). I haven't updated the screenshots above though yet! I've also cleaned it up, reorganized and placed more useful comments around the source code.
Get the new source code and .exe from here: Download SMLAA_0.99.7z