• Direct-mapped caches are the opposite extreme from fully
associative caches
• In a direct-mapped cache, each memory address can only be
stored in one location in the cache
• When a memory operation is sent to a direct-mapped cache, a
subset of the bits in the address is used to select the line in the cache that
may contain the address
• Another subset of the bits is used to select the byte
within a cache line that the address points to
• In general, the n lowest-order bits in the address are used
to determine the position of the address within its cache where n is the base-2
logarithm of the number of bytes in the line
• The m next higher-order bits, where m is the base-2
logarithm of the number of lines in the cache, are used to select the line in
which the address may be stored
• Line number is calculated using
the following function
i = j modulo m
Where
i = cache line number
j = main memory block (not full address) number
m = number of lines in the cache
Each main memory address can
by divided into three fields
•
Least
Significant w bits identify unique word within a block
•
Remaining
bits (s) specify which block in memory. These are divided into two fields
– Least significant r bits of these s bits identifies
which line in the cache
– Most significant s-r bits uniquely identifies the
block within a line of the cache
Direct Mapping Cache Line
Table
Cache line
|
Main Memory blocks held
|
0
|
0, m, 2m, 3m…2s–m
|
1
|
1, m+1, 2m+1…2s–m+1
|
m–1
|
m–1, 2m–1, 3m–1…2s–1
|
Direct Mapping pros & cons
•
Simple
•
Inexpensive
•
Fixed location for given block –
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high (thrashing)
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high (thrashing)
Accessing cache
Total number of
bits needed for a cache
• Size of tag field is
32
– (n + m + 2)
• The total number of
bits in a direct-mapped cache
2n
x (block size + tag size + valid field size).
Block size = 2m words
(2m+5 bits), 1 bit for valid field size,
Total number of
bits in cache is
2n x (2m x 32 +
(32-n-m-2) + 1)
= 2n x (2m x 32 + 31-n-m)
Example
How many total bit are
required for a direct mapped cache with 16kb of data and 4-word blocks,
assuming a 32-bit address?
• 16kb is 4K (212) words
• with block size of 4 words (22), 1024 (210) blocks remaining for cache size
• each block has 4 x 32= 128 bits data plus a tag, which is 32 – 10 – 2 – 2 bits
• plus a valid bit
• total cache size = 210 x (4 x 32+ (32 – 10 – 2 – 2) + 1)
= 210 x 147
=147 Kbits
• 16kb is 4K (212) words
• with block size of 4 words (22), 1024 (210) blocks remaining for cache size
• each block has 4 x 32= 128 bits data plus a tag, which is 32 – 10 – 2 – 2 bits
• plus a valid bit
• total cache size = 210 x (4 x 32+ (32 – 10 – 2 – 2) + 1)
= 210 x 147
=147 Kbits
Cache Misses
Cache misses refers to a
failed attempt to read or write a piece of data in the cache, which results in
a main memory access with much longer latency. There are three kinds of cache
misses: instruction read miss, data read miss, and data write miss.
A cache read miss from
an instruction cache generally causes the most delay, because the processor, or
at least the thread of execution, has to wait (stall) until the instruction is
fetched from main memory.
A cache read miss from a
data cache usually causes less delay, because instructions not dependent on the
cache read can be issued and continue execution until the data is returned from
main memory, and the dependent instructions can resume execution.
A cache write miss to a
data cache generally causes the least delay, because the write can be queued
and there are few limitations on the execution of subsequent instructions. The
processor can continue until the queue is full.
Since
the processor then has to wait for the data to be fetched from the next cache
level or from main memory before it can continue to execute, cache misses
directly influence the performance of the application.
It
is hard to tell from just the number of misses if cache misses are causing
performance problems in an application. The same number of misses will cause a
much greater relative slowdown in a short-running application than in a
long-running one.
If we know the miss rate for reads in a cache memory, we can calculate the number of read-stall cycles as follows:
For writes, the
expression is similar, except that the effect of the write buffer must be added
in:
In many cache memory
systems, the penalties are the same for a cache read or write. Here we can use
a single miss rate, and miss penalty:
Example:
Assume a cache ``miss
rate'' of 5%, (a ``hit rate'' of 95%) with cache memory of 20ns cycle time, and
main memory of 150ns cycle time. We can calculate the average cycle time as
(1-0.05)*20ns + 0.05*150ns = 26.5ns
Write-Through
Although
write through minimizes the risk of data loss, every write operation must be
done twice, and this redundancy takes time. The active application program must
wait until each block of data has been written into both the main memory and
the cache before starting the next operation. The "data insurance"
therefore comes at the expense of system speed.
Write
through is the preferred method of data storage in applications where data loss
cannot be tolerated, such as banking and medical device control. In less
critical applications, and especially when data volume is large, an alternative
method called write back accelerates system performance because updates are
normally written exclusively to the cache, and are backed up in the main memory
only at specified intervals or under certain conditions.
Write Back
Write back is a storage method in which data is written into the cache every time a change occurs, but is written into the corresponding location in main memory only at specified intervals or under certain conditions.
When a data location is updated in write back mode, the
data in cache is called fresh, and the corresponding data in main memory, which
no longer matches the data in cache, is called stale. If a request for stale
data in main memory arrives from another application program, the cache
controller updates the data in main memory before the application accesses it.
Write back optimizes the system speed because it takes
less time to write data into cache alone, as compared with writing the same
data into both cache and main memory. However, this speed comes with the risk
of data loss in case of a crash or other adverse event.
Write back is the preferred method of data storage in
applications where occasional data loss events can be tolerated. In more
critical applications such as banking and medical device control, an
alternative method called write through practically eliminates the risk of data
loss because every update gets written into both the main memory and the cache.
In write through mode, the main memory data always stays fresh.
Ø Write-through – The information is written to both the block in the
cache and to the block in the next lower level of the memory hierarchy
·
Write-through
is always combined with a write buffer so write waits to lower level memory can
be eliminated (as long as the write buffer doesn’t fill)
Ø Write-back – The information is written only to the block in the
cache. The modified cache block is written to main memory only when it is
replaced.
·
Need
a dirty bit to keep track of whether the block is clean or dirty
Ø Pros and cons of each
·
Write-through:
read misses don’t result in writes (so are simpler and cheaper)
·
Write-back:
repeated writes require only one write to lower level