Mad Vertex's hideout

19Feb/120

Symmetrical Morphological Anti-Aliasing

I've been working on this technique some  year or two ago and haven't touched it for a while, but since AMD recently published their MLAA source code I was curious to see how they compare, and the results seem interesting enough. My implementation is, at the moment, twice as fast as the AMD's one in higher resolutions (eg, 1920x1080), or on slower GPUs (check the profiling results so far: SMLAA_benchmarks.txt and feel free to send me yours).

However, the main goal of my 'Symmetrical MLAA' wasn't performance but to be minimally invasive: to reduce damage to the original image as much as possible and to always err on the side of caution. It is based on the same idea as the MLAA but with a slightly different approach (could also be called a 'Restrained MLAA' or 'Restricted MLAA' - but I like 'Symmetrical' more because of the shape used, read on).

Even if you don't like the idea of a 'restrained' MLAA (maybe you like your AA extra-strong!), my demo should also be an interesting example on how to improve the performance of the existing AMD MLAA (or a similar technique).

Check the full description of the technique below, and here's the demo with the source code (I've just added my technique to AMD's Demo and added a couple of things like ability to use any image as input instead of just the demo 3D scene):

Download SMLAA_0.9.7z [edit]: check the new, updated version at the end of the post

 

Some screenshots

(I'll add more in the future)

 

How SMLAA works

Unlike in the original MLAA [Reshetov 2009] approach, I don't use the L shape as the basis of the edge processing. This is because it does not guarantee preservation of the average amount of color in the image and will, in some cases, slightly enlarge one surface (background or foreground) at the expense of the other or incorrectly blur unwanted areas ('synthetic test' screenshots above demonstrate this).

Instead, I use the Z shape (image left/above shows all possible Z shape variations that are looked for) as the basis of the detection and handling of aliasing artifacts (it should be noted that this is not the same as the Z shape from the [Reshetov 2009]). This shape, being symmetric with regards to the edge, has some advantages when compared to the L shape:

  •  It preserves the average image color - the amount of color blurred from the one side of the line always equals to the amount blurred from the other and it does not significantly blur the ’lone’ one pixel wide lines that end with an edge.
  • Is more stable with regards to dynamic changes, as it is symmetric and the line lengths used are shorter. (The line length changes commonly occur due to movement-induced shape changes, or the edge being intersected by another scene object.)

It is a multi-pass effect and a description each step follows. Note, this is a general description of the algorithm, and is slightly different from the actual implementation; in the demo some of the presented steps require multiple render passes, or are reordered for performance reasons.

Step 1: Edge detection

->

For this demo the edge detection is modified to be the same as in AMD's MLAA demo: based on the render/screen alpha value that stores the previously generated luminance of the texture (so they do exactly the same edge detection for comparison reasons). However, since in the real game engine this probably would not be the way to do things (requires an extra channel, AA is often caused by lighting differences so just diffuse texture color wouldn't cut it, etc) I'll write a follow-up on this in the future, explaining the edge detection in my original SMLAA implementation which is based on per-channel color difference and (optionally) more expensive depth (and normal, if available) comparisons, extracting as much quality as possible.

The only main difference from the AMD's demo is that this steps creates the texture containing edges used later and, at the same time, creates the stencil mask used to optimize some of the following steps. Therefore the edge detection cost is payed once, so the edge detection algorithm can be more expensive (i.e. of higher quality) and the generated info later reused in other steps.

 Step 2: Z shape detection and measuring line lengths for Z-s

The only complex shape that we are concerned with is the Z shape and its four rotations. This shape represents the point at which the rasterizer, rendering a triangle edge (or a line), ’steps’ from one pixel to the next on the ’slower’ axis. I found that the best compromise for detecting this type of aliasing shape, and excluding false positives, is the algorithm that requires certain edges to exist, and certain edges to be absent from the pixel neighborhood (for more details, see enclosed shader file, sorry but it's a bit complicated to explain here). This detection algorithm is simple enough to be executed in a pixel shader and uses previously created edges texture (which is in the DXGI_FORMAT_R8_ format, thus small and requiring little bandwidth) as the only input and benefits from the stencil mask.

In this DirectX11 demo, pixel shader pass is used to detect Z shapes, which are then stored in AppendStructuredBuffer and subsequently processed in a number of compute shader steps, resulting in a Z-shape blur map buffer. This could be implemented in DirectX10.1 as well but then it becomes slightly more complicated.

Then, to be able to correctly blur the Z shape edge, we need to know the length of the rasterized line segment to the left and right (or up and down) from each shape centre. This can be determined in a number of ways: using a specific compute shader which iterates over all Z shapes (this demo, using a ConsumeStructuredBuffer); using a recursive doubling technique as presented in [Hensley et al. 2005] on older (Shader Model 3) hardware; directly on the CPU if the hardware architecture allows for it. The maximum line length can be limited to a certain value (bigger value - better smoothing of long near-horizontal and near-vertical edges but more expensive; AMD's demo uses max line length of 16 (I think), mine is limited at 24 at the moment but that's easy to tweak by modifying MAX_BLEND_LINE_LENGTH in the SMLAA.hlsl).

All this data is saved into a 'blur map' (also DXGI_FORMAT_R8_ format, check out the SMLAA demo's 'blurmap' display modes).

Step 3: Simple shapes

In addition to Z shapes, edges not covered by them are additionally antialiased using a simple 3x3 smart blur filter that applies blur to a pixel if it is surrounded by two or more edges. Since this step can cause change in the average image color by reducing the intensity of ’lone’ pixels caused by alpha-tested drawing, odd lighting cases, particles or other noisy effects, the amount of blurring is relatively small and can be tuned based on the scenario.

All data is again added to the 'blur map' (in the actual implementation this step is combined into the step 4).

Step 4: Apply the blurmaps

 
Blurmaps are then applied to the main render target containing the image being processed. In the demo, final blur areas are copied into a temporary texture (as this step requires reading from the main render target and writing to it, and this isn't supported, at least on PCs) and then applied onto the main render target. This step is, again, greately optimized using the stencil mask generated during the edge detection step.

Notes:

1.) It should be fairly easy to optimize AMD's demo using the stencil masking as well.

2.) My original SMLAA algorithm uses adaptive quality system that drops/increases edge detection thresholds, aiming to keep the number of edges in a 'sane' range. This prevents the unusually noisy image/rendering from costing too much and also increases quality across various lighting and other scenarios.

3.) In this example/demo, the only modification of the original AMD's algorithm was to switch to using a _SRGB offscreen texture instead of the linear one; this fixes precision-induced banding issues that are not obvious in the demo (unless you look very hard in the dark area below the tank and toggle the effect on/off) but are obvious if using a darker image - I noticed it when using other input images.
To make this real obvious in the original demo either compare the OFF/ON (Show MLAA checkbox) screenshots in photoshop or make the image darker (for example, by adding "Output.Diffuse.rgb *= 0.1;" line to the Scene.hlsl after line 67).
This _SRGB mod also fixes the blending issue, which the original demo has, that overbrightens AA areas (although I'm not sure it's still 100% correct, need to verify all this).

 

[edit]: I've updated the code to slightly improve performance and, more importantly, fixed a bug that caused the effect to not be applied in certain cases (such as some thin 1-pixel lines). I haven't updated the screenshots above though yet! I've also cleaned it up, reorganized and placed more useful comments around the source code.

Get the new source code and .exe from here: Download SMLAA_0.99.7z

 

Filed under: Uncategorized No Comments
30Oct/104

GPU-based water simulator thingie

This is something I played with two years ago - I posted it as a demo on gamedev.net forums (link) but later never had time to get back to it. My plan was to write a detailed article, but as it seems that will never happen and people frequently ask me for more info, I've decided to simply put it all up here with just a brief overview and full project source code.

The goal of this whole thing was to render realistic flowing water for big terrain areas on DirectX9 generation hardware.

The basic idea was to precalculate water flow over a static terrain represented by a heightfield and then use this data in the realtime application (game, simulation, etc) to do a simpler and cheaper localised wave simulation and rendering.

The process is thus split into two stages:

  1. Waterflow simulation (editor) stage
  2. Realtime 3D visualisation stage

The demo project contains both stages (modes), which can be toggled between using F5 key. See Readme.txt contained in the archive for more details.

Waterflow simulation (editor) stage

This is the 'editor' stage (it would go into a tools pipeline / editor in the case of a game engine): it takes a heightmap as input and, in my case, a list of springs which add water to the simulation. It outputs the state of the simulation - usually once it is stabilised. This simulation process can take tens of minutes or hours, based on the terrain size and other parameters. Once the user is happy with the way water is flowing across the terrain, it can save a 'snapshot' of the simulation state which exports it in a format used by the next, realtime renderer stage.

One good example of a similar algorithm that can be used for this, with more realistic simulation and terrain erosion, is described the "Interactive Terrain Modeling Using Hydraulic Erosion" (link). I am not going to explain my version as it is pretty similar, but feel free to dive into (pun not intended) source code and ask any questions.

Here is a short video of the algorithm in action:

Realtime 3D stage

The renderer stage will not further modify the base water flow, but will use simpler surface wave simulation and effects to provide the illusion of moving water. This is enough for most visualisation or game purposes and is fast and scalable.

A couple of effects are used to provide the appearance of moving water:

  • Simple localised surface wave simulation that affects normal map and displaces water vertices to a certain degree and can bounce from river banks or other objects. I think I based my algorithm on this article and with a little bit of tweaking made it work on the GPU. This simulation is then moved using the velocity map from the simulation stage to add the realistic water flow effect.

    This simulation will only be performed on a rectangular block representing the area around the observer. When the observer moves, the simulation area is updated accordingly. Since this is a relatively fast simulation, newly added area will quickly stabilise into the regular wave pattern for the represented area, so few or no visible artifacts will be induced (unless observer moves too fast).

    To allow for a distance based level of detail and add more wave frequencies, multiple simulation layers are run in a cascaded fashion, with each cascade covering the smaller one and its surroundings with the observer near the center.

    Water perturbance is added to simulation cascades at real time based on the first stage (flow) simulation state, at areas of high velocity deltas. It will also be added for any other input such as wind or floating/splashing objects.

  • One additional channel in the simulation texture is used to store a quantity representing the amount of foam which is propagated in parallel with the simple wave simulation well using the velocity map from the flow simulation stage. This adds to the appearance of faster moving and/or splashing water areas.

    The foam is rendered using foam tiled texture mapped using UVs that move along the velocity map direction. To prevent UV stretching (as the velocities are different on different areas), three overlapping layers (with slightly different UV scales and offsets) are continuously blended between in such way that the blending is always done between two textures while the third one is not visible and can have its UVs reset to prevent stretching. This is further augmented by a noise based blending mask to hide tiling details and hide blending artifacts.

    Same (or very similar) technique for achieving flowing normal and colour maps was recently presented in SIGGRAPH 2010 Water Flow in Portal 2.

  • Finally, high frequency waves are added to areas of higher water velocity by simply adding animated normal 'noise' wave map on top using the same technique of moving UVs as for displaying the foam texture.

There you go! Of course, there are many things that can be improved and also two stages could be combined to add a completely simulated realtime flow effect. This would enable you to do something like the upcoming Ubisoft's 'From Dust' game, which looks amazing indeed!

The project with the full source code (DirectX9, C++, VisualStudio 2010) can be downloaded from here.

Just the binaries can be downloaded from here.

Datasets can be downloaded from:
hetch_half_dataset (lo-res - good for playing with simulator as it's quick)

hetch_full_dataset (hi-res - looks much better)

4k_x_2k_dataset (big one, pretty unfinished, I could've added more rivers)

If you wish to use your own heightmap, there's an explanation on how to set up a project in the readme file, and you'll need this to convert from a 16bit grayscale .tiff heightmap into .tbmp format used by RiverSim. Have fun and I'll be glad to answer any questions!

Filed under: Uncategorized 4 Comments
11Jul/1042

Oh no, another terrain rendering paper!

My paper, Continuous Distance-Dependent Level of Detail for Rendering Heightmaps, was recently published in the journal of graphics, gpu and game tools - it took a while but it's finally done.

Slightly updated version can be downloaded from here - demo, code and data download links are in the paper.

Abstract:

This paper presents a technique for GPU-based rendering of heightmap terrains, which is a refinement of several existing methods with some new ideas. It is similar to the terrain clipmap approaches [Tanner et al. 98, Losasso 04], as it draws the terrain directly from the source heightmap data. However, instead of using a set of regular nested grids, it is structured around a quadtree of regular grids, more similar to [Ulrich 02], which provides it with better level-of-detail distribution. The algorithm's main improvement over previous techniques is that the LOD function is the same across the whole rendered mesh and is based on the precise three-dimensional distance between the observer and the terrain. To accomplish this, a novel technique for handling transition between LOD levels is used, which gives smooth and accurate results. For these reasons the system is more predictable and reliable, with better screen-triangle distribution, cleaner transitions between levels, and no need for stitching meshes. This also simplifies integration with other LOD systems that are common in games and simulation applications. With regard to the performance, it remains favourable compared to similar GPU-based approaches and works on all graphics hardware supporting Shader Model 3.0 and above. Demo and complete source code is available online under a free software license.

I'm currently working on a DirectX11 CDLOD demo, and when that's done I'll try out what I wanted to play with for a while now - the hardware tessellation. We'll see how it goes; I'll post the results here.

Let me know if you find this paper (and code) useful and if you have any questions or suggestions!

[edit] Adding the download links here... [\edit]

Binaries and a small example dataset
http://www.vertexasylum.com/downloads/cdlod/binaries_tools_testdata.exe

Complete source code
http://www.vertexasylum.com/downloads/cdlod/source.zip

Example datasets
http://www.vertexasylum.com/downloads/cdlod/dataset_califmtns.exe,
http://www.vertexasylum.com/downloads/cdlod/dataset_hawaii.exe,
http://www.vertexasylum.com/downloads/cdlod/dataset_puget.exe

Example animations
http://www.vertexasylum.com/downloads/cdlod/cdlod_calif.wmv,
http://www.vertexasylum.com/downloads/cdlod/cdlod_hawaii.wmv,
http://www.vertexasylum.com/downloads/cdlod/cdlod_params.wmv

Filed under: Uncategorized 42 Comments
9May/10Off

Update on AdVantage Terrain library

Since I haven't been updating the my terrain library (AdVantage Terrain - silly name, isn't it?) for a while now and I'm probably not about to do so in the future, I've decided to close the web page and move everything here.

But not all is lost! In the meantime I've been working on a paper that describes an updated/modified version of the algorithm that inherits the quadtree organisation, but works directly on heightmaps (kind of like 'Geometry Clipmaps', but with more correct LOD distribution and smoother transitions between levels) - I'll update this web page with all the details as soon as it gets published. The paper will come with the full free-to-use source code and everything, so it practically makes AdVantage terrain library obsolete except in some rare scenarios.

24Mar/10Off

Post numero uno

Right.. a blog. Never had one before. Let's see how it works out!