Chapter 2
Real-time texturing methods

The goal of WebGL Earth is to display a virtual globe and to be able to zoom down to street-level. We need to be able to display maps with up to 23 zoom levels — 223 × 223 tiles at the most detailed zoom level. With tiles being 256px × 256px and having a 32 bit depth, we would have to be able to store more than 4 exabytes1 of data directly in video memory. That is impossible on today’s mainstream hardware, which usually has 512 megabytes or 1 gigabyte of video memory.

Therefore, we had to implement some form of tile management system, that would take care of dynamic tile streaming, caching, and real-time lookup in a GLSL shader program during rendering.

2.1 Existing solutions

There are several existing solutions, but WebGL Earth is targeted to run on mainstream or even low-end hardware such as laptops, tablets or mobile phones.

Nearly the whole project has to be implemented in JavaScript (which has poor performance compared to compiled languages such as C++). However, WebGL enables us to perform low-level, high-performance operations in shaders, that are compiled and run directly on the graphics card. We should try to move as much calculations as possible from CPU to GPU in order to maximize performance, although too complex or excessively diverging shaders would become the bottleneck of the whole application.

In order to further optimize performance, we should move as much calculations as possible from the fragment shader to the vertex shader (as we already suggested in section 1.3). To be able to do this, we have to subdivide the geometry so that the edges of polygons are aligned with the edges of tiles (more details in section 3.4).

2.1.1 Virtual Texturing

Virtual Textures (as described by Mittring [MG08]) is a method of large texture management that, as its name indicates, adopts the fundamental ideas of memory virtualization from computer operating systems.

In 2005, id Software announced a new technology called MegaTextures, but the implementation details were unknown. In 2009, it was confirmed, that a newer, revised version of the technology called “MegaTextures v2” is essentially virtual texturing. [Wav09, pp. 3–21] [May10, p. 14]

In implementations of virtual texturing, the whole texture is divided into pages (we can use tiles directly as pages, because 256 × 256px division provides enough granularity). There is a large buffer in video memory which serves as a page buffer and where the needed tiles are placed. The application also has to maintain a page table (or lookup table) that will provide information (in constant time) whether the page with given coordinates is available in the buffer.


Figure 2.1: Principle of the Page Table

In order to display a tile that is currently not present in the buffer, we have to find a “slot” to buffer it into. In case there are no empty slots left, an existing tile has to be discarded from the buffer. Least Recently Used (LRU) is the most commonly used algorithm for the “victim selection”.

Virtual texturing implementations also usually employ the so-called feedback buffer into which the scene is being rendered with a special shader program applied. The feedback buffer is then transfered to RAM and analyzed. It contains information about what tiles should optimally be in the buffer at the given moment.

An implementation of virtual texturing called “MegaTextures in WebGL” and its description [Van09] was the initially proposed solution to the texturing problem. However, after deeper analysis, we found out it was not suitable for our needs.

The main disadvantage of virtual texturing is its low scalability — the space complexity of the page table is linear in the number of pages in the whole texture. We need to be able to display tiles from zoom levels up to 23. In other words, we have 2232 tiles on the most detailed level. The necessary page table would be too large to be stored in both graphics or operating memory of a mainstream computer.

With the knowledge, that the needed tiles are usually from relatively small, continuous area of the texture (There is no need to display two different parts of the world at once), we could maintain a quadtree instead of the page table, which would allow us to save a lot of memory. When some larger area of the map is not in the buffer, we can mark the whole area mark as unavailable by not partitioning the quadtree any further. This would effectively solve the page table storage problem, but would require more complex updating, during which we would have to potentially create and destroy whole subtrees. The tile lookup operations executed in the vertex shader would also become quite complex.

However, virtual texturing is not suitable for our needs mainly because the only way of storing the page table (or the page tree) is in the video memory as a simple texture – it is too much data to be transfered from the regular operating memory every time a video frame is rendered. Because we want to perform the tile lookup operations per-vertex for performance reasons, we need Vertex Texture Fetch support (access to texture buffers from vertex shaders), which is not declared mandatory by the GLSL ES 1.0 specification [Khr09, p. 113].

2.1.2 ClipMapping

This technique, originally described by C. Tanner [TMJ98], is based on observations about mipmapping. Mip Map, as defined by L. Williams [Wil83], is a series of images with increasingly reduced resolution down to 1 × 1px and thus forming a so-called mipmap pyramid. Modern graphics cards are able to generate the mipmap pyramid “on-the-fly” during buffering of the texture data. When a geometry with mipmapped texture is being rendered, texels (texture elements) from appropriate levels of the pyramid are chosen depending on several factors. Most important are camera angle from the surface, camera distance, and texture filtering used. This effectively reduces number of texels that needs to be read from the video memory increasing performance and helping to solve aliasing artifacts at the same time.

The calculations ensuring appropriate mipmap level selection during rendering are designed in such way, that the final texel to pixel mapping ratio is as close to 1:1 as possible [Bar07]. We can easily deduce, that from each mipmap level at most as many texels as the number of pixels in the viewport where the scene is being rendered are needed.

If the “clipmapped” texture is mapped onto the geometry continuously, we can find a certain “point of interest” (ClipCenter) which defines what data will be needed from each mipmap level in the currently rendered frame (this subset is called the ClipLevel). (figure 2.1.2)


Figure 2.2: Area covered by the ClipMap as a subset of MipMap

It is very important to update the ClipMap after every ClipCenter change, because there is no guarantee, that the whole area is still appropriately covered. Usually, the ClipCenter moves just by a fraction of the ClipLevel size. If we set the wrapping mode of the textures to gl.REPEAT, it can be addressed in a “loop” (so-called toroidal addressing). With this in mind, we can simply overwrite the tiles that are no longer needed with the newly needed ones. This way we can prevent unnecessary data transfers.

There is no built-in support for ClipMaps in mainstream hardware, so we have to maintain each ClipLevel as an independent WebGL texture including whole mipmap pyramid and ensure the ClipMap consistency ourselves.

When rendering ClipMapped geometry, we have to choose the proper ClipLevel for each segment. Antonio Seoane et al. [Seo+07] describes two different approaches to Hardware-Independent ClipMap rendering:

  1. For each segment of the geometry (each tile) apply the finest available texture.
  2. For each ClipLevel, draw all geometry segments that are covered by this level but not any finer one.

Both approaches require the geometry to be divided into a large number of small, completely independent segments. This would result in a higher CPU load (larger data structure overhead, frequent vertex buffer switching), which conflicts with our intention to move as much computation from JavaScript to GPU as possible. Demands on more complex data structure management on CPU side is the main reason why ClipMapping is (in the original variant) unsuitable for WebGL Earth.

Google Earth uses a patented [Tan03] method based on ClipMapping called “Universal Texture”, but it is not optimal for use with tile-based data sets. [Bar07]

2.2 Our solutions

JavaScript and WebGL form an environment with specific attributes. The most important is the aforementioned performance, but there are also other factors. One of them is portability – the WebGL standard (together with GLSL ES 1.0) is designed to be implementable on almost any modern piece of hardware. Due to this, a lot of features are marked as not mandatory to be implemented and other operations are restricted.

The most important limitations are:

  1. – some platforms may not be able to provide access to texture samplers from vertex shaders. VTF capability can be determined from the special constant gl_MaxVertexTextureImageUnits, which is available in both vertex and fragment shaders. [Khr09, p. 61] We should completely avoid using VTF.
  2. – we have no guarantee that shaders containing while loops will get compiled at all. If we really have to create a loop in shaders, always use the for statement, although it has additional restrictions: The loop index has to be one local variable with constant growth and boundaries. The index is also not allowed to change inside the body of the loop. [Khr09, pp. 108–109]
  3. – It would be sometimes useful to have access to raw data from framebuffers via gl.readPixels (feedback buffer for virtual textures) or the data from image elements (would be useful for terrain height determining or dividing tiles into smaller units). However, this is both restricted to “prevent information leakage” as long as any content from different domain than document itself was loaded into appropriate canvas or image element. [Khr11, section 4.2]

Mainly because of these environment properties, existing solutions of very large texture handling and management are not suitable for WebGL Earth. We therefore designed new, derived methods that are more adequate for our needs.

2.2.1 TileBuffer with binary lookup

The main problem of virtual texturing (section 2.1.1) is (as we’ve described before) it’s low scalability. Instead of having an enormous lookup table where most of the space is empty, we tried a reverse approach. We introduced metadata that describes the content of each “slot” of the buffer (in context of this method called the TileBuffer).

Figure 2.3: Example of metadata for  8 × 8 TileBuffer

 Slot ID  Zoom level  Tile coordinates 

0 8 [79; 230]

1 -1 empty

2 14 [6396; 2436]
63 7 [8; 45]

The metadata are small enough to be repeatedly sent into the vertex shader as a uniform array (storage qualifiers are described in section 1.3). By going through the metadata, the shader determines in what slot (if any) the needed tile is stored. Simply iterating over the array has linear complexity. We can, however, reduce the complexity of the tile lookup operation to logarithmic by sorting the metadata by zoom level and tile coordinates, then we can perform the more effective binary search.

Updating the TileBuffer is identical to updating the virtual texture’s buffer (including the LRU structure for the “victim selection”). Every time we send a tile into the TileBuffer, we also appropriately update the metadata.

The most noticeable drawback of this approach, that is actually inherited from Virtual Textures, are visible “artifacts” on tile edges. This happens because unrelated tiles are often stored next to each other in the TileBuffer. If we use linear filtering, each resulting pixel is obtained by sampling two nearest texels and performing linear interpolation between them. The texture sampling units perform the interpolation even on the tile edges. Although the portion of neighboring “foreign” texel in the final pixel is always less than one half, it can be very visually distracting (figure 2.2.1).


Figure 2.4: Visible artifacts on tile edges

This issue can be partially solved by performing strict “clamping” in the fragment shader, but we lose the benefits of linear filtering. Although the absence of filtering on the tile edges is not noticeable during texturing, it became a serious problem when we used this method for heightmap-based 3D terrain. It caused visible “jumps” and “holes” in the geometry.

This method proved to be usable for the problem specified, but the amount of instructions executed in the vertex shader increases with TileBuffer size and with 8 × 8 tiles the rendering already takes too long on low-end devices and on some graphics cards the program doesn’t even link due to forced loop unrolling which makes the code too long on some hardware.

2.2.2 ClipStack

ClipMapping (section 2.1.2) proved to be an interesting method of very large texture management. By introducing several modifications, improvements and simplifications we get a new method, suitable for implementation in JavaScript and WebGL, that we call the ClipStack.

We previously stated, that the regular ClipMap is too CPU-demanding. ClipStack changes this by having all geometry stored in one large immutable vertex buffer residing in graphics memory and moving ClipLevel selection to the vertex shader.


Figure 2.5: ClipStack

A ClipStack is a collection of ClipLevels (figure 2.2.2). Each ClipLevel represents a continuous subset of one level of the tile pyramid and contains additional information about the location of this subset inside the whole level (offset) and metadata describing which parts of the underlying buffer are valid and which are yet to be filled with the data from the tile server.

When rendering geometry textured with the ClipStack we send the metadata and offsets to the shader program. The Vertex shader calculates the coordinates of the optimal tile to map onto the vertex. If this tile is not available in the appropriate ClipLevel, which is determined by simply looking into the metadata, the shader looks for the fallback tile.

Although having the same size, each ClipLevel covers twice as large an area as the previous one. It would be an unnecessary waste of computational power to fall back too many times – the shader would get too complex, too much data would have to be streamed, and individual executions would potentially diverge too much. We limit the number of active ClipLevels that are actually used during single frame rendering to a constant value. We use 3 active ClipLevels with the resolution of 2048 × 2048px (8 × 8tiles, 256 × 256px each) in our implementation. It proved to be enough to cover a very large area making it really hard to see “uncovered” geometry even inside viewports with a resolution close to 1920 × 1080px.

We can also reduce the memory usage by creating a limited number of ClipBuffers (greater than or equal to the number of active ClipLevels) and “sliding” them so that the active levels are always assigned a buffer.

In some situations, the appropriate data may be missing in all active ClipLevels. In such cases, we don’t want the geometry to be transparent or single-colored. We define the ClipLevelN, which is a special, reduced (without metadata or offset information) level that covers the whole planet and serves as the “ultimate fallback”.

12232 × 2562 × 4B = 16 × 260B