Houjun Liu

SU-CS143 MAY272025

Memory Hierachy

  • registers: 1 cycle (256 bytes or so)
  • cache: 4 cycles (100MB or so)
  • memory: 100-500 cycles (128GB)
  • disk: millions of cycles (1-10TB)
  • power usage limits: with a fixed power budget, we can’t really get more transistors in the same space (i.e. stuff can’t be faster)
  • cache miss cost is quite high, so using multiple lines of cache

register allocation

see Register Allocation

performance engineering

example

Consider:

for (int j=1; j<10; j++)
    for (int i=1; i<10000; i++)
        a[i] *= b[i]

you can see that, by the time you get to b[100000], you probably have lost b[1] from your cache.

Instead, just reordering the loop:

for (int i=1; i<10000; i++)
    for (int j=1; j<10; j++)
        a[i] *= b[i]

notice! at this point you are just using the same b[i] 10 times, and it will stay in cache.