Tuesday, 10 December 2013

Memory Organization (Direct Mapped Cache/Accessing Cache/Caches Misses) by SONG CHONG HARN B031310216

Direct Mapped cache

• Direct-mapped caches are the opposite extreme from fully associative caches
• In a direct-mapped cache, each memory address can only be stored in one location in the cache
• When a memory operation is sent to a direct-mapped cache, a subset of the bits in the address is used to select the line in the cache that may contain the address
• Another subset of the bits is used to select the byte within a cache line that the address points to
• In general, the n lowest-order bits in the address are used to determine the position of the address within its cache where n is the base-2 logarithm of the number of bytes in the line 
• The m next higher-order bits, where m is the base-2 logarithm of the number of lines in the cache, are used to select the line in which the address may be stored


Line number is calculated using the following function
i = j modulo m

Where
i = cache line number
j = main memory block (not full address) number
m = number of lines in the cache

Each main memory address can by divided into three fields
         Least Significant w bits identify unique word within a block
         Remaining bits (s) specify which block in memory. These are divided into two fields
       Least significant r bits of these s bits identifies which line in the cache
       Most significant s-r bits uniquely identifies the block within a line of the cache





Direct Mapping Cache Line Table

Cache line
Main Memory blocks held
0
0, m, 2m, 3m…2s–m
1
1, m+1, 2m+1…2s–m+1
m–1
m–1, 2m–1, 3m–1…2s–1












Direct Mapping pros & cons
         Simple
         Inexpensive
         Fixed location for given block –
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high (thrashing)


  Accessing cache

Total number of bits needed for a cache
Size of tag field is
32 – (n + m + 2)
The total number of bits in a direct-mapped cache
2n x (block size + tag size + valid field size).

Block size = 2m words (2m+5 bits), 1 bit for valid field size,

Total number of bits in cache is
2n x (2m x 32 + (32-n-m-2) + 1)
= 2n x (2m x 32 + 31-n-m)

Example
How many total bit are required for a direct mapped cache with 16kb of data and 4-word blocks, assuming a 32-bit address?
16kb is 4K (212) words
with block size of 4 words (22), 1024 (210) blocks remaining for cache size
each block has 4 x 32= 128 bits data plus a tag, which is 32 – 10 – 2 – 2 bits
plus a valid bit
total cache size = 210 x (4 x 32+ (32 – 10 – 2 – 2) + 1)
                             = 2
10 x 147
                             =147 Kbits

 Cache Misses
Cache misses refers to a failed attempt to read or write a piece of data in the cache, which results in a main memory access with much longer latency. There are three kinds of cache misses: instruction read miss, data read miss, and data write miss.

A cache read miss from an instruction cache generally causes the most delay, because the processor, or at least the thread of execution, has to wait (stall) until the instruction is fetched from main memory.

A cache read miss from a data cache usually causes less delay, because instructions not dependent on the cache read can be issued and continue execution until the data is returned from main memory, and the dependent instructions can resume execution.

A cache write miss to a data cache generally causes the least delay, because the write can be queued and there are few limitations on the execution of subsequent instructions. The processor can continue until the queue is full.

Since the processor then has to wait for the data to be fetched from the next cache level or from main memory before it can continue to execute, cache misses directly influence the performance of the application.

It is hard to tell from just the number of misses if cache misses are causing performance problems in an application. The same number of misses will cause a much greater relative slowdown in a short-running application than in a long-running one.

If we know the miss rate for reads in a cache memory, we can calculate the number of read-stall cycles as follows:


For writes, the expression is similar, except that the effect of the write buffer must be added in:


In many cache memory systems, the penalties are the same for a cache read or write. Here we can use a single miss rate, and miss penalty:


Example:
Assume a cache ``miss rate'' of 5%, (a ``hit rate'' of 95%) with cache memory of 20ns cycle time, and main memory of 150ns cycle time. We can calculate the average cycle time as


(1-0.05)*20ns + 0.05*150ns = 26.5ns


Write-Through

Write through is a storage method in which data is written into the cache and the corresponding main memory location at the same time. The cached data allows for fast retrieval on demand, while the same data in main memory ensures that nothing will get lost if a crash, power failure, or other system disruption occurs.

Although write through minimizes the risk of data loss, every write operation must be done twice, and this redundancy takes time. The active application program must wait until each block of data has been written into both the main memory and the cache before starting the next operation. The "data insurance" therefore comes at the expense of system speed.

Write through is the preferred method of data storage in applications where data loss cannot be tolerated, such as banking and medical device control. In less critical applications, and especially when data volume is large, an alternative method called write back accelerates system performance because updates are normally written exclusively to the cache, and are backed up in the main memory only at specified intervals or under certain conditions.

Write Back

Write back is a storage method in which data is written into the cache every time a change occurs, but is written into the corresponding location in main memory only at specified intervals or under certain conditions.

When a data location is updated in write back mode, the data in cache is called fresh, and the corresponding data in main memory, which no longer matches the data in cache, is called stale. If a request for stale data in main memory arrives from another application program, the cache controller updates the data in main memory before the application accesses it.

Write back optimizes the system speed because it takes less time to write data into cache alone, as compared with writing the same data into both cache and main memory. However, this speed comes with the risk of data loss in case of a crash or other adverse event.

Write back is the preferred method of data storage in applications where occasional data loss events can be tolerated. In more critical applications such as banking and medical device control, an alternative method called write through practically eliminates the risk of data loss because every update gets written into both the main memory and the cache. In write through mode, the main memory data always stays fresh.

Ø  Write-through – The information is written to both the block in the cache and to the block in the next lower level of the memory hierarchy
·         Write-through is always combined with a write buffer so write waits to lower level memory can be eliminated (as long as the write buffer doesn’t fill)

Ø  Write-back – The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.
·         Need a dirty bit to keep track of whether the block is clean or dirty

Ø  Pros and cons of each
·         Write-through: read misses don’t result in writes (so are simpler and cheaper)
·         Write-back: repeated writes require only one write to lower level

Memory Organization (Memory Hierarchy/Cache Memory/Cache Analogy/Cache Example) by CHEAH SIEW KWAN B031310163

Memory Hierarchy

Memory is categorized into volatile and nonvolatile memories, with the former requiring constant power ON of the system to maintain data storage.Furthermore, a typical computer system provides a hierarchy of different times of memories for data storage.

Memory hierarchy can exploit in 2 approach:

Approach 1 :Expose
Registers, Main Memory, Disk each available as explicit storage alternatives

Approach 2 :Hide

Machine automatic assigns locations, depending on runtime usage patterns of single kind of memory, single address space.


Now to exploit the memory hierarchy, there are two technique that depend on CPU speed. The CPU speed is dominated by the memory performance. They are caching and virtual memory. Caching make slow main memory appear faster and virtual memory make small main memory appear bigger.

CACHE Memory

Cache memory is random access memory (RAM) that a computer microprocessor can access faster than it can access regular RAM.Cache is a small high-speed memory. Stores data from some frequently used addresses of main memory.Cache memory also is The level of the memory hierarchy closest to the CPU.

CACHE Analogy

You are writing notes in a notepad in llibrary.
-In your work, youneed take a reference book to table.
-Sit down and write.
-Suddenly stop writing and take a book on the table to reference.
-You are not  return the book to the shelve immediately, you just put on the table.
-Then you need another book to refer, go take another book for your work.
-Soon, your table have a few books that you need to refer.
-Now you can do your work smoothly without get the reference book from the shelve.
-You see, the table where you use is the cache for the library.

Cache  Example

Here is the example of cache.
Now, we need:
-8-blocks(1 word per block)
-direct mapped

-initial state









Tuesday, 3 December 2013

MIPS (Introduction to MIPS /Registers ) by CHONG KIT SHING B031310164



Introduction to MIPS
MIPS  or Microprocessor without Interlocked Pileline Stages is a Reduced Instruction Set Computer (RISC) instruction set architecture, meaning the CPU uses registers to perform operations on.
A MIPS processor compose of an integer processing unit (the CPU),and a collection of coprocessors that perform subsidiary tasks. 

Microprocessor Operations
Most processors will repeat three basic steps to execute one machine instruction. The process of completing these steps is called a machine cycle. Figure 1.1 shows the machine cycle.
                


 
                                                                Figure 1.1 A Machine cycle

Memory Allocation
System based on MIPS processors is typically divide memory into three parts:
I)                    Text segment: holds the machine language code for instructions in the source file (user program).
II)                  Data segment: holds the data the program operates on. It is divided into two parts:  
Static data – contain statically allocated data which its size does not change as the program access them.        
Dynamic data – It is on top of static data. It is allocated and deallocated by the program executes.        
III)                Stack segment -  It resides st the top of the user address space. In a high level language                                             
program, local variables and parameters are pushed and popped on the stack as the operating systems expands and shrink the shrink the stack segment toward the data segment.



Data Alignment

Table 1.1 shows list data sizes apply to MIPS chips.
Type
Size(Binary)
Size(Hexadecimal)
Byte
8 bits
e.g. 10100011
2 bits
e.g. 3A
Word
4 bytes, 32 bits
e.g. 10000000……
8 bits
e.g. 12345ABC
Double Word
8 bytes,64 bits
e.g. 10000000……
16 bits
e.g. 123456789ABCDEF0
                                                                            Table 1.1 MIPS Data Sizes


Table 1.2 shows data alignment in MIPS.
Double
Word
Word
HalfWord
HalfWord
HalfWord
HalfWord
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte








Table 1.2 Data Alignment


Registers                                                                                                           
Registers are memory just like RAM, except registers are much smaller and faster than RAM. In MIPS the CPU can only do operations on registers, and special immediate values.

 MIPS processors have 32 general purpose registers that are numbered as 0-31 and another 32 floating-point registers, but some of these are reserved. A fair number of registers however are available for your use. For example, one of these registers, the program counter, contains the memory address of the next instruction to be executed. As the processor executes the instruction, the program counter is incremented, and the next memory address is fetched, executed, and so on.
Registers all begin with a dollar-symbol ($). The floating point registers are named $f0, $f1, ..., $f31. The general-purpose registers have both names and numbers, and are listed in Table 1.3 below. When programming in MIPS assembly, it is usually best to use the register names.


Register Number
Register Name
(Mnemonic Name)
Comments/Usage
$0 
 $zero
 Always zero
$1
 $at
 Assembler Temporary(Reserved for assembler)
$2, $3
 $v0, $v1
 First and second return values, respectively
$4, ..., $7
 $a0, ..., $a3
 First four arguments to functions
$8, ..., $15
 $t0, ..., $t7
 Temporary registers (not preserved across a function call)
$16, ..., $23
 $s0, ..., $s7
 Saved registers (preserved across a function call)
$24, $25
 $t8, $t9
 More temporary registers
$26, $27
 $k0, $k1
 Reserved for kernel (operating system)
$28
 $gp
 Global pointer (preserved on call)
$29
 $sp
 Stack pointer (preserved on call)
$30
 $fp
 Frame pointer (preserved on call)
$31
 $ra
 Return address (automatically used in some instructions)
Table 1.3 MIPS register conventions

 
Explanations:
-The zero register ($zero or $0) always contains a value of 0. It is built into the hardware and therefore cannot be modified.

-The $at (Assembler Temporary) register is used for temporary values within pseudo commands. It is not preserved across function calls.

- The $v Registers are used for returning values from functions. They are not preserved across function calls.

- The $a registers are used for passing arguments to functions. They are not preserved across function calls.

- The $t (temporary registers) registers are caller-saved registers that are used to hold temporary values .For a beginner; user can use these registers in their program.

- The $s (Saved Temporary) are callee-saved registers that used to store longer lasting values. They are preserved across function calls. For a beginner, user can use these registers in their program.

-The $k registers are reserved for use by the operating system kernel.

- Global Pointer ($gp) is a pointer that points to the middle of a 64k block of memory in the static data segment.

- Stack Pointer ($sp) – It points the last location on the stack and used to store the value of the stack pointer.

Frame Pointer ($fp) - Used to store the value of the frame pointer.

Return Address ($ra) - Contains the return address written by jal (the location in the program that a function needs to return to).

All Pointer Registers are preserved across function calls.




 
Ø  SPIM simulate two coprocessors. Coprocessor 0 handles traps, exceptions, and the virtual memory system. SPIM simulates most of the first two and entirely omits details of the memory system. Coprocessor 1 is the floating point unit. SPIM simulates most aspects of this unit.
Ø  Furthermore, Coprocessors 0 contains exception control registers that handles exceptions. SPIM does not implement all of coprocessors 0’s registers, since they are not much useful in a simulator or part of the memory system, which is not implemented. However, it does provide trap registers in Table 1.4.


Register Name
Register Number
Usage
BadVAddr
8
Memory address at which address exception occurred
Status
12
Interrupt mask and enable bits
Cause
13
Exception type and pending interrupt bits
EPC
14
Address of instruction that caused exception
Table 1.4 Trap Register


These registers are part of   coprocessor 0’s register set accessed by the mfc0 and mtc0 instructions.
                            15                           8                                            4                                                 1      0

Interrupt Mask

U
M

E
L
I
E
Figure 1.2 Status Register


B
D

Pending Interrupt

Exception Code

Figure 1.3 Cause Register

 
Figure 1.2 shows the bits in the Status register that are implemented by SPIM. The interrupt mask holds a bit for each of the eight interrupt levels. If a bit is one, interrupts at that level are allowed. If the bit is zero, interrupts at that level are disabled. The user mode(UM) bit is 0 if the program was running in kernel mode and 1 if it was running in user mode. The exception level (EL) bit is commonly 0, but will set to 1 if an exception occurs. If the interrupt enable (IE) it is 1, interrupts are allowed. If it is 0, they are disabled.


Figure 1.3 shows the bits in the Cause register. The eight pending interrupt bits correspond to the eight levels. A bit becomes 1 when an interrupt at its level has occurred but has not been serviced. The exception code bits contain a code from Table 1.5 describing the cause of an exception.



Number
Name
Cause of exception
0
Int
Interrupt (hardware)
4
AdEL
Address error exception (load or instruction fetch)
5
AdES
Address error exception (store)
6
IBE
Bus error on instruction fetch
7
DBUS
Bus error on data load or store
8
SYSCALL
Syscall exception
9
BKPT
Breakpoint exception
10
RI
Reserved instruction exception
12
OVF
Arithmetic overflow exception
13
Tr
Trap

Table 1.5 Exception code register