Auxiliary surface compression¶
Most lossless image compression on Intel hardware, be that CCS, MCS, or HiZ, works by way of some chunk of auxiliary data (often a surface) which is used together with the main surface to provide compression. Even though this means more memory is allocated, the scheme allows us to reduce our over-all memory bandwidth since the auxiliary data is much smaller than the main surface.
The simplest example of this is single-sample fast clears
(isl_aux_usage::ISL_AUX_USAGE_CCS_D
) on Ivy Bridge through
Broadwell and later. For this scheme, the auxiliary surface stores a single
bit for each cache-line-pair in the main surface. If that bit is set, then the
entire cache line pair contains only the clear color as provided in the
RENDER_SURFACE_STATE
for the image. If the bit is unset, then it’s not
clear and you should look at the main surface. Since a cache line is 64B, this
yields a scale-down factor of 1:1024.
Even the simple fast-clear scheme saves us bandwidth in two places. The first
is when we go to clear the surface. If we’re doing a full-surface clear or
clearing to the same color that was used to clear before, we don’t have to
touch the main surface at all. All we have to do is record the clear color and
smash the aux data to 0xff
. The hardware then knows to ignore whatever is
in the main surface and look at the clear color instead. The second is when we
go to render. Say we’re doing some color blending. Instead of the blend unit
having to read back actual surface contents to blend with, it looks at the
clear bit and blends with the clear color recorded with the surface state
instead. Depending on the geometry and cache utilization, this can save as
much as one whole read of the surface worth of bandwidth.
The difficulty with a scheme like this comes when we want to do something else with that surface. What happens if the sampler doesn’t support this fast-clear scheme (it doesn’t on IVB)? In that case, we have to do a resolve where we run a special pipeline that reads the auxiliary data and applies it to the main surface. In the case of fast clears, this means that, for every 1 bit in the auxiliary surface, the corresponding pair of cache lines in the main surface gets filled with the clear color. At the end of the resolve operation, the main surface contents are the actual contents of the surface.
Types of surface compression¶
Intel hardware has several different compression schemes that all work along similar lines:
-
enum
isl_aux_usage
¶ Enumerates the different forms of auxiliary surface compression
Values:
-
enumerator
ISL_AUX_USAGE_NONE
¶ No Auxiliary surface is used
-
enumerator
ISL_AUX_USAGE_HIZ
¶ Hierarchical depth compression
First introduced on Iron Lake, this compression scheme compresses depth surfaces by storing alternate forms of the depth value in a HiZ surface. Possible (not all) compressed forms include:
An uncompressed “look at the main surface” value
A special value indicating that the main surface data should be ignored and considered to contain the clear value.
The depth for the entire main-surface block as a plane equation
The minimum/maximum depth for the main-surface block
This second one isn’t helpful for getting exact depth values but can still substantially accelerate depth testing if the specified range is sufficiently small.
-
enumerator
ISL_AUX_USAGE_MCS
¶ Multisampled color compression
Introduced on Ivy Bridge, this compression scheme compresses multisampled color surfaces by storing a mapping from samples to planes in the MCS surface, allowing for de-duplication of identical samples. The MCS value of all 1’s is reserved to indicate that the pixel contains the clear color. Exact details about the data stored in the MCS and how it maps samples to slices is documented in the PRMs.
- Invariant
-
enumerator
ISL_AUX_USAGE_CCS_D
¶ Single-sampled fast-clear-only color compression
Introduced on Ivy Bridge, this compression scheme compresses single-sampled color surfaces by storing a bit for each cache line pair in the main surface in the CCS which indicates that the corresponding pair of cache lines in the main surface only contains the clear color. On Skylake, this is increased to two bits per cache line pair with 0x0 meaning resolved and 0x3 meaning clear.
- Invariant
The surface is a color surface
- Invariant
isl_surf::samples == 1
-
enumerator
ISL_AUX_USAGE_CCS_E
¶ Single-sample lossless color compression
Introduced on Skylake, this compression scheme compresses single-sampled color surfaces by storing a 2-bit value for each cache line pair in the main surface which says how the corresponding pair of cache lines in the main surface are to be interpreted. Valid CCS values include:
0x0
: Indicates that the corresponding pair of cache lines in the main surface contain valid color data0x1
: Indicates that the corresponding pair of cache lines in the main surface contain compressed color data. Typically, the compressed data fits in one of the two cache lines.0x3
: Indicates that the corresponding pair of cache lines in the main surface should be ignored. Those cache lines should be considered to contain the clear color.
Starting with Tigerlake, each CCS value is 4 bits per cache line pair in the main surface.
- Invariant
The surface is a color surface
- Invariant
isl_surf::samples == 1
-
enumerator
ISL_AUX_USAGE_GFX12_CCS_E
¶ Single-sample lossless color compression on Tigerlake
This is identical to ISL_AUX_USAGE_CCS_E except it also encodes the Tigerlake quirk about regular render writes possibly fast-clearing blocks in the surface.
- Invariant
The surface is a color surface
- Invariant
isl_surf::samples == 1
-
enumerator
ISL_AUX_USAGE_MC
¶ Media color compression
Used by the media engine on Tigerlake and above. This compression form is typically not produced by 3D drivers but they need to be able to consume it in order to get end-to-end compression when the image comes from media decode.
- Invariant
The surface is a color surface
- Invariant
isl_surf::samples == 1
-
enumerator
ISL_AUX_USAGE_HIZ_CCS_WT
¶ Combined HiZ+CCS in write-through mode
In this mode, introduced on Tigerlake, the HiZ and CCS surfaces act as a single fused compression surface where resolves (but not ambiguates) operate on both surfaces at the same time. In this mode, the HiZ surface operates in write-through mode where it is only used for accelerating depth testing and not for actual compression. The CCS-compressed surface contains valid data at all times.
- Invariant
The surface is a color surface
- Invariant
isl_surf::samples == 1
-
enumerator
ISL_AUX_USAGE_HIZ_CCS
¶ Combined HiZ+CCS without write-through
In this mode, introduced on Tigerlake, the HiZ and CCS surfaces act as a single fused compression surface where resolves (but not ambiguates) operate on both surfaces at the same time. In this mode, full HiZ compression is enabled and the CCS-compressed main surface may not contain valid data. The only way to read the surface outside of the depth hardware is to do a full resolve which resolves both HiZ and CCS so the surface is in the pass-through state.
- Invariant
The surface is a depth surface
-
enumerator
ISL_AUX_USAGE_MCS_CCS
¶ Combined MCS+CCS without write-through
In this mode, introduced on Tigerlake, we have fused MCS+CCS compression where the MCS is used for fast-clears and “identical samples” compression just like on Gfx7-11 but each plane is then CCS compressed.
- Invariant
The surface is a depth surface
- Invariant
-
enumerator
ISL_AUX_USAGE_STC_CCS
¶ Stencil compression
Introduced on Tigerlake, this is similar to CCS_E only used to compress stencil surfaces.
- Invariant
The surface is a stencil surface
- Invariant
isl_surf::samples == 1
-
enumerator
-
bool
isl_aux_usage_has_fast_clears
(enum isl_aux_usage usage)¶
-
bool
isl_aux_usage_has_compression
(enum isl_aux_usage usage)¶
-
static inline bool
isl_aux_usage_has_hiz
(enum isl_aux_usage usage)¶
-
static inline bool
isl_aux_usage_has_mcs
(enum isl_aux_usage usage)¶
-
static inline bool
isl_aux_usage_has_ccs
(enum isl_aux_usage usage)¶
Creating auxiliary surfaces¶
Each type of data compression requires some type of auxiliary data on the side. For most, this involves a second auxiliary surface. ISL provides helpers for creating each of these types of surfaces:
-
bool
isl_surf_get_hiz_surf
(const struct isl_device *dev, const struct isl_surf *surf, struct isl_surf *hiz_surf)¶ Constructs a HiZ surface for the given main surface.
- Parameters
surf – [in] The main surface
hiz_surf – [out] The HiZ surface to populate on success
- Returns
false if the main surface cannot support HiZ.
-
bool
isl_surf_get_mcs_surf
(const struct isl_device *dev, const struct isl_surf *surf, struct isl_surf *mcs_surf)¶ Constructs a MCS for the given main surface.
- Parameters
surf – [in] The main surface
mcs_surf – [out] The MCS to populate on success
- Returns
false if the main surface cannot support MCS.
-
bool
isl_surf_supports_ccs
(const struct isl_device *dev, const struct isl_surf *surf, const struct isl_surf *hiz_or_mcs_surf)¶ - Parameters
surf – [in] The main surface
hiz_or_mcs_surf – [in] HiZ or MCS surface associated with the main surface
- Returns
true if the given surface supports CCS.
-
bool
isl_surf_get_ccs_surf
(const struct isl_device *dev, const struct isl_surf *surf, const struct isl_surf *hiz_or_mcs_surf, struct isl_surf *ccs_surf, uint32_t row_pitch_B)¶ Constructs a CCS for the given main surface.
In spite of this, it’s sometimes useful to think of it as being a linear buffer-like surface, at least for the purposes of allocation. When invoked on Tigerlake or later, this function still works and produces such a linear surface.
Note
Starting with Tigerlake, the CCS is no longer really a surface. It’s not laid out as an independent surface and isn’t referenced by RENDER_SURFACE_STATE::”Auxiliary Surface Base Address” like other auxiliary compression surfaces. It’s a blob of memory that’s a 1:256 scale-down from the main surfaced that’s attached side-band via a second set of page tables.
- Parameters
surf – [in] The main surface
hiz_or_mcs_surf – [in] HiZ or MCS surface associated with the main surface
ccs_surf – [out] The CCS to populate on success
row_pitch_B – The row pitch for the CCS in bytes or 0 if ISL should calculate the row pitch.
- Returns
false if the main surface cannot support CCS.
Compression state tracking¶
All of the Intel auxiliary surface compression schemes share a common concept of a main surface which may or may not contain correct up-to-date data and some auxiliary data which says how to interpret it. The main surface is divided into blocks of some fixed size and some smaller block in the auxiliary data controls how that main surface block is to be interpreted. We then have to do resolves depending on the different HW units which need to interact with a given surface.
To help drivers keep track of what all is going on and when resolves need to be
inserted, ISL provides a finite state machine which tracks the current state of
the main surface and auxiliary data and their relationship to each other. The
states are encoded with the isl_aux_state
enum. ISL also provides
helper functions for operating the state machine and determining what aux op
(if any) is required to get to the right state for a given operation.
-
enum
isl_aux_state
¶ Enum for keeping track of the state an auxiliary compressed surface.
For any given auxiliary surface compression format (HiZ, CCS, or MCS), any given slice (lod + array layer) can be in one of the seven states described by this enum. Drawing with or without aux enabled may implicitly cause the surface to transition between these states. There are also four types of auxiliary compression operations which cause an explicit transition which are described by the isl_aux_op enum below.
Not all operations are valid or useful in all states. The diagram below contains a complete description of the states and all valid and useful transitions except clear.
Draw w/ Aux +----------+ | | | +-------------+ Draw w/ Aux +-------------+ +------>| Compressed |<-------------------| Clear | | w/ Clear |----->----+ | | +-------------+ | +-------------+ | /|\ | | | | | | | | | | +------<-----+ | Draw w/ | | | | Clear Only | | Full | | +----------+ Partial | | Resolve | \|/ | | Resolve | | | +-------------+ | | | | | Partial |<------+ | | | | Clear |<----------+ | | | +-------------+ | | | | | | | | +------>---------+ Full | | | | Resolve | Draw w/ aux | | Partial Fast Clear | | +----------+ | +--------------------------+ | | | | \|/ | \|/ | | +-------------+ Full Resolve +-------------+ | +------>| Compressed |------------------->| Resolved | | | w/o Clear |<-------------------| | | +-------------+ Draw w/ Aux +-------------+ | /|\ | | | | Draw | | Draw | | w/ Aux | | w/o Aux | | Ambiguate | | | | +--------------------------+ | | Draw w/o Aux | | | Draw w/o Aux | +----------+ | | | +----------+ | | | | \|/ \|/ | | | | +-------------+ Ambiguate +-------------+ | | +------>| Pass- |<-------------------| Aux |<------+ | +------>| through | | Invalid | | | +-------------+ +-------------+ | | | | | +----------+ +-----------------------------------------------------+ Draw w/ Partial Fast Clear Clear Only
While the above general theory applies to all forms of auxiliary compression on Intel hardware, not all states and operations are available on all compression types. However, each of the auxiliary states and operations can be fairly easily mapped onto the above diagram:
HiZ:
Hierarchical depth compression is capable of being in any of the states above. Hardware provides three HiZ operations: “Depth
Clear”, “Depth Resolve”, and “HiZ Resolve” which map to “Fast Clear”, “Full Resolve”, and “Ambiguate” respectively. The hardware provides no HiZ partial resolve operation so the only way to get into the “Compressed w/o Clear” state is to render with HiZ when the surface is in the resolved or pass-through states.
MCS: Multisample compression is technically capable of being in any of the states above except that most of them aren’t useful. Both the render engine and the sampler support MCS compression and, apart from clear color, MCS is format-unaware so we leave the surface compressed 100% of the time. The hardware provides no MCS operations.
CCS_D: Single-sample fast-clears (also called CCS_D in ISL) are one of the simplest forms of compression since they don’t do anything beyond clear color tracking. They really only support three of the six states: Clear, Partial Clear, and Pass-through. The only CCS_D operation is “Resolve” which maps to a full resolve followed by an ambiguate.
CCS_E: Single-sample render target compression (also called CCS_E in ISL) is capable of being in almost all of the above states. THe only exception is that it does not have separate resolved and pass- through states. Instead, the CCS_E full resolve operation does both a resolve and an ambiguate so it goes directly into the pass-through state. CCS_E also provides fast clear and partial resolve operations which work as described above.
Note
The state machine above isn’t quite correct for CCS on TGL. There is a HW bug (or feature, depending on who you ask) which can cause blocks to enter the fast-clear state as a side-effect of a regular draw call. This means that a draw in the resolved or compressed without clear states takes you to the compressed with clear state, not the compressed without clear state.
Values:
-
enumerator
ISL_AUX_STATE_CLEAR
¶ Clear
In this state, each block in the auxiliary surface contains a magic value that indicates that the block is in the clear state. If a block is in the clear state, its values in the primary surface are ignored and the color of the samples in the block is taken either the RENDER_SURFACE_STATE packet for color or 3DSTATE_CLEAR_PARAMS for depth. Since neither the primary surface nor the auxiliary surface contains the clear value, the surface can be cleared to a different color by simply changing the clear color without modifying either surface.
-
enumerator
ISL_AUX_STATE_PARTIAL_CLEAR
¶ Partial Clear
In this state, each block in the auxiliary surface contains either the magic clear or pass-through value. See Clear and Pass-through for more details.
-
enumerator
ISL_AUX_STATE_COMPRESSED_CLEAR
¶ Compressed with clear color
In this state, neither the auxiliary surface nor the primary surface has a complete representation of the data. Instead, both surfaces must be used together or else rendering corruption may occur. Depending on the auxiliary compression format and the data, any given block in the primary surface may contain all, some, or none of the data required to reconstruct the actual sample values. Blocks may also be in the clear state (see Clear) and have their value taken from outside the surface.
-
enumerator
ISL_AUX_STATE_COMPRESSED_NO_CLEAR
¶ Compressed without clear color
This state is identical to the state above except that no blocks are in the clear state. In this state, all of the data required to reconstruct the final sample values is contained in the auxiliary and primary surface and the clear value is not considered.
-
enumerator
ISL_AUX_STATE_RESOLVED
¶ Resolved
In this state, the primary surface contains 100% of the data. The auxiliary surface is also valid so the surface can be validly used with or without aux enabled. The auxiliary surface may, however, contain non-trivial data and any update to the primary surface with aux disabled will cause the two to get out of sync.
-
enumerator
ISL_AUX_STATE_PASS_THROUGH
¶ Pass-through
In this state, the primary surface contains 100% of the data and every block in the auxiliary surface contains a magic value which indicates that the auxiliary surface should be ignored and only the primary surface should be considered. In this mode, the primary surface can safely be written with ISL_AUX_USAGE_NONE or by something that ignores compression such as the blit/copy engine or a CPU map and it will stay in the pass-through state. Writing to a surface in pass-through mode with aux enabled may cause the auxiliary to be updated to contain non-trivial data and it will no longer be in the pass-through state. Likely, it will end up compressed, with or without clear color.
-
enumerator
ISL_AUX_STATE_AUX_INVALID
¶ Aux Invalid
In this state, the primary surface contains 100% of the data and the auxiliary surface is completely bogus. Any attempt to use the auxiliary surface is liable to result in rendering corruption. The only thing that one can do to re-enable aux once this state is reached is to use an ambiguate pass to transition into the pass-through state.
-
enumerator
-
static inline bool
isl_aux_state_has_valid_primary
(enum isl_aux_state state)¶
-
static inline bool
isl_aux_state_has_valid_aux
(enum isl_aux_state state)¶
-
enum
isl_aux_op
¶ Enum describing explicit aux transition operations
These operations are used to transition from one isl_aux_state to another. Even though a draw does transition the state machine, it’s not included in this enum as it’s something of a special case.
Values:
-
enumerator
ISL_AUX_OP_NONE
¶ Do nothing
-
enumerator
ISL_AUX_OP_FAST_CLEAR
¶ Fast Clear
This operation writes the magic “clear” value to the auxiliary surface. This operation will safely transition any slice of a surface from any state to the clear state so long as the entire slice is fast cleared at once. A fast clear that only covers part of a slice of a surface is called a partial fast clear.
-
enumerator
ISL_AUX_OP_FULL_RESOLVE
¶ Full Resolve
This operation combines the auxiliary surface data with the primary surface data and writes the result to the primary. For HiZ, the docs call this a depth resolve. For CCS, the hardware full resolve operation does both a full resolve and an ambiguate so it actually takes you all the way to the pass-through state.
-
enumerator
ISL_AUX_OP_PARTIAL_RESOLVE
¶ Partial Resolve
This operation considers blocks which are in the “clear” state and writes the clear value directly into the primary or auxiliary surface. Once this operation completes, the surface is still compressed but no longer references the clear color. This operation is only available for CCS_E.
-
enumerator
ISL_AUX_OP_AMBIGUATE
¶ Ambiguate
This operation throws away the current auxiliary data and replaces it with the magic pass-through value. If an ambiguate operation is performed when the primary surface does not contain 100% of the data, data will be lost. This operation is only implemented in hardware for depth where it is called a HiZ resolve.
-
enumerator
-
enum isl_aux_op
isl_aux_prepare_access
(enum isl_aux_state initial_state, enum isl_aux_usage usage, bool fast_clear_supported)¶ Return an isl_aux_op needed to enable an access to occur in an isl_aux_state suitable for the isl_aux_usage.
- Invariant
initial_state is possible with an isl_aux_usage compatible with the given usage. Two usages are compatible if it’s possible to switch between them (e.g. CCS_E <-> CCS_D).
- Invariant
fast_clear is false if the aux doesn’t support fast clears.
Note
If the access will invalidate the main surface, this function should not be called and the isl_aux_op of NONE should be used instead. Otherwise, an extra (but still lossless) ambiguate may occur.
-
enum isl_aux_state
isl_aux_state_transition_aux_op
(enum isl_aux_state initial_state, enum isl_aux_usage usage, enum isl_aux_op op)¶ Return the isl_aux_state entered after performing an isl_aux_op.
- Invariant
initial_state is possible with the given usage.
- Invariant
op is possible with the given usage.
- Invariant
op must not cause HW to read from an invalid aux.
-
enum isl_aux_state
isl_aux_state_transition_write
(enum isl_aux_state initial_state, enum isl_aux_usage usage, bool full_surface)¶ Return the isl_aux_state entered after performing a write.
- Invariant
if usage is not ISL_AUX_USAGE_NONE, then initial_state is possible with the given usage.
- Invariant
usage can be ISL_AUX_USAGE_NONE iff:
the main surface is valid, or
the main surface is being invalidated/replaced.
Note
full_surface should be true if the write covers the entire slice. Setting it to false in this case will still result in a correct (but imprecise) aux state.