17. Texture cache (ICACHE)

17.1 ICACHE introduction

The texture cache (ICACHE) is introduced on the AXI read-only texture port of the GPU to improve performance when reading texture data from internal and external memories.

The texture cache is an assembly of ICACHE (a peripheral with AHB ports) and an AXI-to-AHB bus bridge plugged on ICACHE AHB slave port, providing an AXI interface on the texture cache slave port.

The following sections only describe the AHB ICACHE peripheral itself.

Some specific features, like hit-under-miss, and critical-word-first refill policy, result in close to zero-wait-state performance in most use cases.

17.2 ICACHE main features

The main features of ICACHE are listed below:

17.3 ICACHE implementation

Table 131. ICACHE features
FeatureICACHE
Number of ways4
Cache size16 Kbytes
Cache line width32 bytes
Number of regions to remap0
Data size of AHB slave interface64 bits
Data size of AHB fast master1 interface64 bits
Data size of AHB slow master2 interface0

17.4 ICACHE functional description

The purpose of the texture cache is to cache GPU read accesses to texture data. As such, the ICACHE manages only cacheable read transactions, and does not manage cacheable write transactions.

The noncacheable transactions (both read and write) bypass the ICACHE.

For the error management purpose, if a write cacheable transaction is presented (this happens only in case of bad software programming), the ICACHE sets an error flag and, if enabled, raises an interrupt to the processor.

17.4.1 ICACHE block diagram

Figure 138. ICACHE block diagram

ICACHE block diagram showing internal components and external interfaces.

The diagram illustrates the internal architecture of the ICACHE block. At the top, an external 'Configuration slave port' connects via an 'AHB' interface to the 'Configuration interface'. This interface contains four configuration registers: 'Region 0 cfg', 'Region 1 cfg', 'Region 2 cfg', and 'Region 3 cfg', as well as 'Hit monitor', 'Miss monitor', 'Control', and 'Status' blocks. On the left, a 'Texture read interface' from a 'GPU' connects through an 'AXI-to-AHB bridge' to a 'Read slave port' on the 'Read port interface'. An 'icache_it' signal is also shown on this side. The central 'Cache control logic' block contains a 'Cache FSM' and a 'pLRU-t' unit. It is connected to the 'Read port interface', the 'Master port interface', and the 'Cache memory port'. The 'Master port interface' connects to 'Master1 port' on the 'Main AHB'. The 'Cache memory port' leads to 'Cache TAG memories' and 'Cache data memories', both of which are organized into 'n ways'. The entire internal structure is labeled 'ICACHE' at the bottom left. A reference code 'MSv69745V2' is located at the bottom right of the diagram.

ICACHE block diagram showing internal components and external interfaces.

17.4.2 ICACHE reset and clocks

The ICACHE is clocked on the texture AHB bus clock.

When the ICACHE reset signal is released, a cache invalidate procedure is automatically launched, making the ICACHE busy (ICACHE_SR = 0x0000 0001).

When this procedure is finished:

Note: When disabled, the ICACHE is bypassed: slave input requests are forwarded to the master port.

17.4.3 ICACHE TAG memory

The ICACHE TAG memory contains:

There is one valid bit per cache line (per way).

The valid bit is set when a cache line is refilled (after a miss).

Valid bits are reset in any of the below cases:

When a cacheable transaction is received at the execution input port, its AHB address (HADDR_in) is split into the following fields (see Table 132 for B and W definitions):

The following table gives a summary of the ICACHE main parameters for TAG memory dimensioning. Figure 139 shows the functional view of TAG and data memories, for an n-way set associative ICACHE.

Table 132. TAG memory dimensioning parameters for n-way set associative operating mode (default)

ParameterValueExample
Cache sizeS Kbytes = s bytes (s = 1024 x S)8 Kbytes = 8192 bytes
Cache number of waysn2
Cache line sizeL-byte = l-bit (l = 8 x L)16-byte = 128-bit
Number of cache linesLpW = s / (n x L) lines / way256 lines / way
Address byte offset sizeB = log 2 (L) bit4-bit
Address way index sizeW = log 2 (LpW) bit8-bit
TAG address sizeT = (32 - W - B) bit20-bit

Figure 139. ICACHE TAG and data memories functional view

Functional view diagram of ICACHE TAG and data memories. The diagram shows the internal architecture of the cache. At the top, an AHB address (HADDR_in) is split into three fields: TAG (T-bit), Index (W-bit), and Offset (B-bit). The Index is used to address both the TAG memory and the Data memory. The TAG memory is an n-way set associative structure with 'n ways' and 'LpW lines / way'. It contains TAG_Way0 to TAG_Way(n-1). The Data memory is also an n-way set associative structure with 'n ways' and 'LpW lines / way', containing Data_Way0 to Data_Way(n-1). A pLRU-t block provides way selection for replacement. Comparison logic compares the TAG from the address with the TAGs from the selected ways in the TAG memory to generate 'Cache hit/miss' signals for Way(n-1) and Way0. The Data memory outputs data for the selected way.
Functional view diagram of ICACHE TAG and data memories. The diagram shows the internal architecture of the cache. At the top, an AHB address (HADDR_in) is split into three fields: TAG (T-bit), Index (W-bit), and Offset (B-bit). The Index is used to address both the TAG memory and the Data memory. The TAG memory is an n-way set associative structure with 'n ways' and 'LpW lines / way'. It contains TAG_Way0 to TAG_Way(n-1). The Data memory is also an n-way set associative structure with 'n ways' and 'LpW lines / way', containing Data_Way0 to Data_Way(n-1). A pLRU-t block provides way selection for replacement. Comparison logic compares the TAG from the address with the TAGs from the selected ways in the TAG memory to generate 'Cache hit/miss' signals for Way(n-1) and Way0. The Data memory outputs data for the selected way.

MSv48192V2

17.4.4 Direct-mapped ICACHE (1-way cache)

The default configuration (at reset) is an n-way set associative cache (WAYSEL = 1 in ICACHE_CR), but the user can configure the ICACHE as direct mapped by writing WAYSEL = 0 (only possible when the cache is disabled, EN = 0 in ICACHE_CR).

The following table gives a summary of ICACHE main parameters for TAG memory when the direct-mapped cache operating mode is selected.

Table 133. TAG memory dimensioning parameters for direct-mapped cache mode

ParameterValueExample
Cache sizeS Kbytes = s bytes (s = 1024 x S)8 Kbytes = 8192 bytes
Cache number of ways11
Cache line sizeL-byte = l-bit (l = 8 x L)16-byte = 128-bit
Number of cache linesLpW = s / L lines512 lines
Address byte offset sizeB = log 2 (L) bit4-bit
Address way index sizeW = log 2 (LpW) bit9-bit
TAG address sizeT = (32 - W - B) bit19-bit

All cache operations (such as read, refill, invalidation) remain the same in the direct-mapped configuration. The only difference is the absence of a replacement algorithm in case of line eviction (as explained in Section 17.4.7 ): only one way (the unique one) is possible for any data refill.

17.4.5 ICACHE enable

To activate the ICACHE, the EN bit in ICACHE_CR must be set to 1.

When the ICACHE is disabled, it is bypassed and all transactions are copied from the slave to the master port in the same clock cycle.

It is recommended to initialize or modify the main memory content (region to be later cached) with the ICACHE disabled, and to enable the ICACHE only when this region remains unchanged (an enabled ICACHE detects cacheable write transactions as errors).

To ensure performance determinism, it is recommended to wait for the end of a potential cache invalidate procedure before enabling the ICACHE. This procedure occurs when the hardware reset signal is released, when CACHEINV is set, or when EN is cleared in ICACHE_CR. During the procedure, BUSYF is set in ICACHE_SR, and once finished, BUSYF is cleared and BSYENDF is set in the same register (raising the ICACHE interrupt if enabled on such a busy end condition).

The software must test BUSYF and/or BSYENDF values before enabling the ICACHE. Else, if the ICACHE is enabled before the end of an invalidate procedure, any cache access (while BUSYF = 1) is treated as noncacheable, and its performance depends on the main memory access time.

The ICACHE is, by default, disabled at boot.

17.4.6 Cacheable and noncacheable traffic

The ICACHE is placed on the GPU texture bus, and thus caches all internal and external memory regions (ranging from address 0x0000 0000 to 0x3FFF FFFF, respectively 0x6000 0000 to 0x9FFF FFFF, of the memory map).

An incoming memory request to the ICACHE is defined as cacheable according to its AHB transaction memory lookup attribute, as shown in Table 134 . This AHB attribute depends on the GPU setting for the addressed region.

Table 134. ICACHE cacheability for AHB transaction

AHB lookup attributeCacheability
1Cacheable
0Noncacheable

In the case of a noncacheable access (read or write), the ICACHE is bypassed. The AHB transaction is propagated unchanged to the master output port.

The bypass does not increase the latency of the access to the targeted memory.

In the case of a cacheable access, the ICACHE behaves as explained in Section 17.4.7 .

17.4.7 Cacheable accesses

When the ICACHE receives a cacheable transaction from the GPU, it checks if the address requested is present in its TAG memory, and if the corresponding cache line is valid.

There are then three alternatives:

The critical-word-first policy ensures minimum wait cycles for the processor, since read data can be provided while the cache still performs a cache line refill (associated latency is the latency of fetching one word from the main memory).

The burst generated on the ICACHE master bus is WRAPw (w being the cache line width, in words).

The AHB transaction attributes are also propagated to the main AHB bus matrix on the master port.

This happens during cache-line refill. The ICACHE can provide the requested data as soon as data are available at its master interface, thus avoiding a miss (fetching data from the main memory).

In the case of cache refill (due to cache miss), the ICACHE selects which cache line is written with the refill data:

If the cache line where the refill data must be written is already valid, the targeted cache line must be invalidated first. This is true whatever the direct map or n-way set associative cache mode.

17.4.8 ICACHE maintenance

The software can invalidate the whole content of the ICACHE by programming CACHEINV in the ICACHE_CR register.

When CACHEINV = 1, the ICACHE control logic sets the BUSYF flag in ICACHE_SR and launches the invalidate cache operation, resetting each TAG valid bit to 0 (one valid bit per cache line). CACHEINV is automatically cleared.

Once the invalidate operation is finished (all valid bits reset to 0), the ICACHE automatically clears BUSYF, and sets BSYENDF in the ICACHE_SR register.

If enabled on this flag condition (BSYENDIE = 1 in ICACHE_IER), the ICACHE interrupt is raised. Then, the (empty) cache is available again.

17.4.9 ICACHE performance monitoring

The ICACHE provides the following monitors for performance analysis:

It also takes into account all accesses whose address is present in the TAG memory or in the refill buffer (due to a previous miss, and whose data is coming, or is soon to come, from the cache master port) (see Section 17.4.7 ).

It also takes into account all accesses whose address is not present neither in the TAG memory nor in the refill buffer.

Upon reaching their maximum values, these monitors do not wrap over.

Hit and miss monitors can be enabled and reset by software allowing the analysis of specific pieces of code.

The software can perform the following tasks:

To reduce power consumption, these monitors are disabled (stopped) by default.

17.4.10 ICACHE boot

The ICACHE is disabled (EN = 0 in ICACHE_CR) at boot.

Once the boot is finished, the ICACHE can be enabled (software setting EN = 1 in ICACHE_CR).

17.5 ICACHE low-power modes

At device level, using the ICACHE reduces the power consumption by reading textures from the internal ICACHE most of the time, rather than from the bigger and then more power consuming main memories. This reduction is even higher if the cached main memories are external.

Applications with lower performance profile (in terms of hit ratio) and stringent power consumption constraints may benefit from the lower power consumption of an ICACHE configured as direct mapped. This single-way cache configuration is obtained by programming WAYSEL = 0 in ICACHE_CR (see Figure 139 ). The power consumption is reduced by accessing, for each request, only the necessary cut of TAG and data memories. The cache effect still improves memory access performance, even if for most texture accesses, it is less efficient than with an n-way set associative cache mode.

17.6 ICACHE error management and interrupts

If an unsupported cacheable write request is detected (functional error), the ICACHE generates an error by setting the ERRF flag in ICACHE_SR. An interrupt is generated if the corresponding interrupt enable bit is set (ERRIE = 1 in ICACHE_IER).

The other possible interrupt generation is at the end of a cache invalidation operation. When the cache-busy state is finished, the ICACHE sets the BSYENDF flag in ICACHE_SR. An interrupt is generated if the corresponding interrupt enable bit is set (BSYENDIE = 1 in ICACHE_IER).

All ICACHE interrupt sources raise the same and unique interrupt signal, icache_it, and then use the same interrupt vector.

Table 135. ICACHE interrupts

Interrupt vectorInterrupt eventEvent flagEnable control bitInterrupt clear method
ICACHEFunctional errorERRF
in ICACHE_SR
ERRIE
in ICACHE_IER
Set CERRF to 1
in ICACHE_FCR
End of busy state
(invalidate finished)
BSYENDF
in ICACHE_SR
BSYENDIE
in ICACHE_IER
Set CBSYENDF to 1
in ICACHE_FCR

The ICACHE also propagates all AHB bus errors (such as address decoding issues) from the master1 port back to the slave read port.

17.7 ICACHE registers

17.7.1 ICACHE control register (ICACHE_CR)

Address offset: 0x000

Reset value: 0x0000 0004

31302928272625242322212019181716
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.MISS
MRST
HITM
RST
MISS
MEN
HITM
EN
rwrwrwrw
1514131211109876543210
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.WAY
SEL
CACHE
INV
EN
rwwrw

Bits 31:20 Reserved, must be kept at reset value.

Bit 19 MISSMRST : miss monitor reset

0: release the cache miss monitor reset (needed to enable the counting)

1: reset cache miss monitor

Bit 18 HITMRST : hit monitor reset

0: release the cache miss monitor reset (needed to enable the counting)

1: reset cache hit monitor

Bit 17 MISSMEN : miss monitor enable

0: cache miss monitor switched off. Stopping the monitor does not reset it.

1: cache miss monitor enabled

Bit 16 HITMEN : hit monitor enable

0: cache hit monitor switched off. Stopping the monitor does not reset it.

1: cache hit monitor enabled

Bits 15:3 Reserved, must be kept at reset value.

Bit 2 WAYSEL : cache associativity mode selection

This bit allows user to choose ICACHE set-associativity. It can be written by software only when cache is disabled (EN = 0).

0: direct mapped cache (1-way cache)

1: n-way set associative cache (reset value)

Bit 1 CACHEINV : cache invalidation

Set by software and cleared by hardware when the BUSYF flag is set (during cache maintenance operation). Writing 0 has no effect.

0: no effect

1: invalidate entire cache (all cache lines valid bit = 0)

Bit 0 EN : enable

0: cache disabled

1: cache enabled

17.7.2 ICACHE status register (ICACHE_SR)

Address offset: 0x004

Reset value: 0x0000 0001

31302928272625242322212019181716
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.
1514131211109876543210
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.ERRFBSYEN
DF
BUSYF
rrr

Bits 31:3 Reserved, must be kept at reset value.

Bit 2 ERRF : cache error flag

0: no error

1: an error occurred during the operation (cacheable write)

Bit 1 BSYENDF : busy end flag

0: cache busy

1: full invalidate CACHEINV operation finished

Bit 0 BUSYF : busy flag

0: cache not busy on a CACHEINV operation

1: cache executing a full invalidate CACHEINV operation

17.7.3 ICACHE interrupt enable register (ICACHE_IER)

Address offset: 0x008

Reset value: 0x0000 0000

31302928272625242322212019181716
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.
1514131211109876543210
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.ERRIEBSYENDIERes.
rwrw

Bits 31:3 Reserved, must be kept at reset value.

Bit 2 ERRIE : interrupt enable on cache error

Set by software to enable an interrupt generation in case of cache functional error (cacheable write access)

0: interrupt disabled on error

1: interrupt enabled on error

Bit 1 BSYENDIE : interrupt enable on busy end

Set by software to enable an interrupt generation at the end of a cache invalidate operation.

0: interrupt disabled on busy end

1: interrupt enabled on busy end

Bit 0 Reserved, must be kept at reset value.

17.7.4 ICACHE flag clear register (ICACHE_FCR)

Address offset: 0x00C

Reset value: 0x0000 0000

31302928272625242322212019181716
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.
1514131211109876543210
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.CERRFCBSYENDFRes.
ww

Bits 31:3 Reserved, must be kept at reset value.

Bit 2 CERRF : clear cache error flag

Set by software.

0: no effect

1: clears ERRF flag in ICACHE_SR

Bit 1 CBSYENDF : clear busy end flag

Set by software.

0: no effect

1: clears BSYENDF flag in ICACHE_SR.

Bit 0 Reserved, must be kept at reset value.

17.7.5 ICACHE hit monitor register (ICACHE_HMONR)

Address offset: 0x010

Reset value: 0x0000 0000

31302928272625242322212019181716
HITMON[31:16]
rrrrrrrrrrrrrrrr
1514131211109876543210
HITMON[15:0]
rrrrrrrrrrrrrrrr

Bits 31:0 HITMON[31:0] : cache hit monitor counter

17.7.6 ICACHE miss monitor register (ICACHE_MMONR)

Address offset: 0x014

Reset value: 0x0000 0000

31302928272625242322212019181716
Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.
1514131211109876543210
MISSMON[15:0]
rrrrrrrrrrrrrrrr

Bits 31:16 Reserved, must be kept at reset value.

Bits 15:0 MISSMON[15:0] : cache miss monitor counter

17.7.7 ICACHE register map

Table 136. ICACHE register map and reset values

OffsetRegister name313029282726252423222120191817161514131211109876543210
0x000ICACHE_CRRes.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.MISSMRSTHITMRSTMISSMENHITMENRes.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.WAYSELCACHEINVEN
Reset value0000100
0x004ICACHE_SRRes.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.ERRFBSYENDFBUSYF
Reset value001
0x008ICACHE_IERRes.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.Res.ERRIEBSYENDIERes.
Reset value000

Table 136. ICACHE register map and reset values (continued)

OffsetRegister name313029282726252423222120191817161514131211109876543210
0x00CICACHE_FCRResResResResResResResResResResResResResResResResResResResResResResResResResResResResResResCERRFIBSYENDFRes
Reset value00
0x010ICACHE_HMONRHITMON[31:0]
Reset value00000000000000000000000000000000
0x014ICACHE_MMONRResResResResResResResResResResResResResResResResMISSMON[15:0]
Reset value000000000000000

Refer to Section 2.3: Memory organization for the register boundary addresses.