SU-CS111 Final Sheet

FS

main challenges

naming: how do users name files
reliability: surviving OS crashes and hardware failures
protection: isolation between users, controlled sharing
disk space management: minimize seeks, sharing space (“preventing fragmentation”)

seeks

to wait until the platter go under the arm and read.

internal v. external fragmentation

internal: a file can be no less than a single block of text.
external: no space is available even if the space in aggregate is available

main designs

contiguous allocation

IBM used this? puts files and meta-data together + implement an explicit free list allocator. benefit: simple; drawback: 1) external fragmentation 2) hard to grow files

linked files

in every block, store the location of the next block; don’t store files continuously—instead, store a pointer to where the next block of the file is. benefit: solves fragmentation and file growth; drawback: 1) huge seek time 2) random access from the middle is hard (i.e. O(n))

Windows FAT

linked files, but cached the file links in memory when using it. benefits: same as linked files, and a bit faster drawback: data still fragmented and now you have a whole ass table to deal with! but its at least faster

File Payload Data

Kind of what we do—instead of storing file data in order OR using links, store the file BLOCK information contiguously.

multi-level index: store all block numbers for a given file down a tree (EXT2/3, Unix V6, NTFS)

Unix V6 + MLI

Sector Size	Block Size	Inode Size	Inodes Per Block	Address Type
512	512	32	16	Short, 2 bytes

block

const size_t INODE_PER_BLOCK = SECTOR_SIZE / sizeof(struct inode);
struct inode inodes[INODE_PER_BLOCK];

char buf[SECTOR_SIZE];
readsector(2, &inodes); // recall this is the first 16 inodes: sec0 is fs info, sec1 is supernode

printf("addr: %d\n", inodes[0].i_add);

ino

struct inode {
    uint16_t i_addr[8];
    uint16_t i_mode[8];
    uint16_t file_size;
}

inodes have two modes

if ((inode.i_mode & ILARG) != 0) == // node is in "large mode"

in small mode, the inode stores in i_addr the block numbers to the data
in large mode, the inode stores in the first seven numbers in i_addr block numbers to blocks that contain block numbers (512/2 = 256 block numbers, which are chars); the eighth number points to doubly indirect blocks that contain block numbers that point to other blocks

The inode table for each file only contains space to point to \(8\) block. 1 block = 1 sector on Unix v6. inodes are usualy 32 bytes big, and 1 block = 1 sector = 512 bytes. usually this packs 16 inodes per block

in large mode, this system can store \((7+256) \cdot (256 \cdot 512) = 34MB\), which is as large as the file system itself, which means we are fine now.

sizing
- small: \(512\) bytes per block, and \(8\) block storable, so \(8 \cdot 512 = 4096\) bytes
- large: \(512\) bytes per block pointed to by i_addr, each containing \(\frac{512}{2} = 256\) block numbers. The first seven in total would therefore address \(256 \times 7 = 1792\) blocks of memory. The last eight would each address \(256 \cdot 256 = 65536\) blocks of memory. In total, that addresses \(1792+65536 = 67328\) blocks of memory. Finally, that means we can address \(67328 \cdot 512 = 34471936\) bytes.

dir

struct dirent {
    uint16_t d_inumber; // inode number of this file
    char d_name[14]; // the name; *NOT NECESSARILY NULL TERMINATED*
}

THE NAME MAY NOT BE NULL TERMINATED to cram max things. You have to use strncmp

strcmp/strncmp: stops comparing after \(n\) characters; <0 if str1 comes before str2 alphabetically; >0 if str1 comes after str2; 0 if equal

Start at the root directory, /. We want to go to the root directory, and find the entry named /classes/, and then, in that directory, find the file. etc. Traverse recursively. Directory could have metadata.

A directory is basically just a file whose payload is a list of dirent.

The inode tells you whether something is a file or a directory. They can be small or large, as usual. Root directory always have inode number 1; 0 is reserved to NULL.

file

Recall that read doesn’t read the whole thing. So, we it in parts.

void copyContents(int sourceFD, int destinationFD) {
    char buffer[INCREMENT];

    while (true) {
        ssize_t bytesRead = read(sourceFD, buffer, sizeof(buffer));
        if (bytesRead == 0) break;
        size_t bytesWritten = 0;
        while (bytesWritten < bytesRead) {
            ssize_t count = write(destinationFD, buffer + bytesWritten,
                                  bytesRead - bytesWritten);
            bytesWritten += count;
        }
    }
}

int open(const char *pathname, int flags);

Flags are a bitwise OR operations: you have to open with O_RDONLY (read only), O_WRONLY (write only), or O_RDWR (both read and write). This returns \(-1\) if the reading fails.

Other flags:

O_TRUNC (truncate file)
O_CREAT (creating a file if not exist), which will require a mode_t mode parameter to set the permission
O_EXCL (file must not exist)

open file table

open-file table is system wide: mentioning what mode an opening is + the cursor to the open file + the number of file-descriptors pointing to it to a refcount.

why is refcount ever higher than 1? because forks.

Block Cache

We will use part of the main memory to retain recently-accessed disk blocks. This is NOT at the granularity of individual files.

Least Recently Used (LRU) Cache

When you insert a new element into the cache, kick out the element on the cache that hasn’t been touched in the longest time.

Block Cache Modification

we can either write asap, or delay.

write asap: safer: less risk of data loss, written as soon as possible; slow: program must wait to proceed until disk I/O completes

write delay: dangerous: may loose data after crash; efficient: memory writes is faster

Crash Recovery

main challenges

data loss: crashes can happen, and not all data could be saved to disk
inconsistency: crashes can happen in the middle of operations

Ideally, filesystem operations should be atomic. Every operation should happen or not happen at all—but not halfway.

fsck

Check whether or not there is a clean shutdown: setting a disk flag on clean shutdown; so if the flag isn’t set there isn’t a clean shutdown.
If it wasn’t a clean shutdown, identify inconsistencies
Scans meta data (inodes, indirect blocks, free list, directory blocks) and handle any of the following situations—
1. block in an inode and in free list; solution: pull the block off of free list
2. block is a part of two inodes; solution: give to newest, random, copy, remove (bad idea)
3. inode claims one dirent refers to it, but there are no such dirent; solution: put in lost and found

limitations

takes long because can’t restart until done
doesn’t prevent loss of actual file info
filesystem may still be unusable (core files moved to lost+found)
a block could migrate during recovery, leaking info

ordered writes

Always initialize the TARGET before initializing the REFERENCE
- Initialize inode before initalize directory entry to it
Never reuse a resource before NULLIFYING all existing REFERENCES
- Remove the inode reference before putting a block on the free list
Never clear the LAST REFERENCE to a live resource before setting a NEW REFERENCE (“its better to have 2 copies instead of none”)
- Make the new directory entry before get rid of the old one

limitations

performance: we need to do operations synchronously
- if we really want to do caching async, we can track dependencies
- circular dependencies are possible
leak: it could leak resources (reference nullification happens but resource not added)
- We can run fsck in the background

journaling

journaling keeps a paper trail of disk appertains in the event of a crash. We have an append-only log on disk that stores disk operations.

before performing an operation, record its info in the log
and write that to disk

The log will always record what’s happening ahead. The actual block updates can eventually be carried out in any order.

what do we log?

we only log metadata changes (inodes, moving stuff around, etc.)
payload operations are not saved

structure

We typically have a LSN: log serial number, operations, and metadata.

LogPatch: changes something
LogBlockFree: mark something as free
LogBlockAlloc: mark something as allocated, optionally zeroing data if its a data block (DO NOT zero if its a dirent or ino)

[offset 335050]
LSN 18384030
operation = "LogBlockAlloc"
blockno = 1027
zero_on_replay = 0

[offset 23232]
LSN N
operation = "LogPatch"
blockno = 8
offset = 137
bytes = 0.04
inode = 52

limitations and fixes

multiple log entries: each atomic operation will be wrapped into a unit transaction to make idempotent
checkpoints: we can truncate the log occasionally at a checkpoint—when it is no longer needed
where do we start replaying: log entries should be idempotent—doing something multiple times should have the same effect of doing them once. Logs cannot have external dependencies
log entries may take time: when finally we write stuff to disk, we write the logs first. So no problems there.

tradeoffs

durability - the data needs to be safe (which is slow, and may require manual crash recovery (sans cache, etc.))
performance - it needs to be fast (which may mean less error checking)
consistency - the filesystem needs to be uniform (which means that we need to be slower and we may drop data in favor of previous checkpoints that worked)

MP

Multiprocessing: processes, PIDs, fork, execution order, copy on write, waitpid, zombie processes, execvp, pipes and pipe / pipe2, I/O redirection

forking

pid_t child_pid = fork();

fork returns the child PID if parent; returns 0 if child.

The arguments list have to BEGIN WITH EXECUTABLE NAME and END WITH NULL.

char *args[] = { "/bin/ls", "-l", "~/hewo", NULL };
execvp(args[0], args);

execvp LEAVES THE FILE DESCRIPTOR TABLE.

every fork has to be waited on by waitpid:

pid_t waitpid(pid_t pid, int *status, int options);

pid
status: pointer to store return about the child
options (0 for now)

if the PID has died, this returns immediately. Otherwise, this blocks.

the `status` int

is a bitmap with a bunch of stuff, which we can check with a series of macros

int status;
int pid_act = waitpid(pid, &status, 0);

if (WIFEXISTED(status)) {
    // child normal exit
    int statuscode = WEXITSTATUS(status);
} else {
   // abnormal exist
}

the returned PID is the PID that got waited on; if the input PID is -1, it will wayt on any process

fork mechanics

The act of copying stack and heap sounds really really expensive. So…. Whapppens?

The child will map the parent’s memory addresses to different physical addresses than for the parent. The copies are LAZY—if the child writes to an area in memory, its virtual address are mapped to different addresses. If no writes by the child happen, the virtual address are mapped to the same address.

during file reading, the file descriptors gets cloned, the underlying open file table doesn’t close.

pipes

int pipes[2];

// create the pipes
int ret = pipe(pipes);
/* int ret = pipe2(pipes, O_CLOEXEC); */

// an so
int read_from_here = ret[0];
int write_to_here = ret[1];
// i.e. ret[1] writes to => ret[0] read

// fork!
pid_t pid_p = fork();

if(pid_p == 0) {
    // child subroutine
    // because child is READING, and not READINg
    // we want to close the write
    close(write_to_here);

    // we want to then make a buffer
    char buf[num_bytes];
    // if the child reads before the parents write
    // it will block until some data is available
    // if the write ends are closed globally, read
    // will also stop.
    read(read_from_here, buffer, sizeof(buffer));
    close(read_from_here);

    return 0;
}

// parent subroutine
// because parent is WRITING and not READING
// we don't want the read to block, we will
// close the parent immediately.
close(read_from_here);

    // write some data
write(write_to_here, "msg", num_bytes);

// close now we are done writing
close(write_to_here);

// clean up child
waitpid(pid_p, NULL, 0);

Recall that dup2 exists:

dup2(fds[0], STDIN_FILENO);
close(fds[0]);

it will close the second file descriptor, if already in use, before binding the first file descriptor to it.

shell

while (true) {
    char *command = { "ls", "things" };

    pid_t child_pid = fork();
    if (!child_pid) {
        // this is the child; execvp will check PATH for you
        execvp(command.argv[0], command.argv);
        // if we got here, the PID didn't do well
        throw STSHException(string(command.argv[0])+": not found or didn't succeed to fork.");
    }

    waitpid(child_pid);

    // do cleanup
}

MT

// now the thread can execute at any time: once a thread is made, it will run in any order
thread myThread(function_to_run, arg1, arg2, ...);
// threads run AS SOON AS SPAWENED: so

We can wait for a thread:

myThread.join()

You can also start a bunch on a loop:

thread threads[3];
for (thread& cf : threads) {
    cf = thread(func, ...);
}

passing by reference

threading doesn’t know the type of arguments being passed into a function; this is especially prevalent when passing by reference.

static void mythingref(int &pbr);
thread(myfunc, ref(myint));

Remember: ref will SHARE MEMORY, and you have no control over when the thread runs. So once a pointer is passed all bets are off in terms of what values things take on.

mutex

it would be nice if a critical section can only be executed once; a mutex can be shared across threads, but can only be “owned” by a single thread at once.

mutex tmp;

tmp.lock();
tmp.unlock();

importantly, if multiple threads are waiting on a mutex, the next thread that’s going to get the mutex

when there are multiple threads writing to a value
when there is a thread writing and one or more threads reading

if you are no writes, you don’t need a mutex

int locked = 0;
Queue blocked_queue;

void Lock::Lock() {
    // disable interrupts: otherwise multiple threads
    // could come and lock the mutex (such as between
    // the locked check and lock =1
    IntrGuard grd;

    if (!locked) {
        // if our thread is not locked, just lock it
        locked = 1;
    } else {
        // if our thread is locked, we need to prevent our current
        // thread from going to the ready queue, and push it to the current thread
        blocked_queue.push(CURRENT_THREAD);

        // remember this isn't an issue even if IntrGuard
        // didn't yet go out of scope; because it will either
        // land on a context_switch which will enable interrupts for you
        // or land on the beginning of a threadfunc helper, which
        // is also going to enable interrupts for you

        // nicely, the interrupts are here are *off* as required because switching
        // to another thread always will result in reenabling (either by new thread,
        // by timer handler, or by IntrGuard)
        mark_block_and_call_schedule(CURRENT_THREAD);
    }
}

void Lock::Unlock() {
    // disable interrupts: otherwise multiple threads
    // could come and lock the mutex (such as between
    // the locked check and lock =1
    IntrGuard grd;

    // if our thread is locked and nobody is waiting for it
    if (q.empty()) {
        locked = 0;
    } else {
        unblock_thread(q.pop());
        // we do not switch to the unblocked thread, just add it to the
        // ready queue. we are entrusting the scheduler to start this thread
        // whenever we feel right
    }
}

CV

condition_variable_any permitsCV;

// ...

thread(ref(permitsCV))

Identify the ISOLATED event to notify; notify absolutely only when needed. To notify:

permitsCV.notify_all();

To listen:

permits.lock();
while (permits == 0) {
    permitsCV.wait(permitsLock);
}

permits--;
permitsLock.unlock();

the condition variable will…

start sleeping FIRST
unlock a lock FOR US AFTER the sleeping starts
after waiting ends, tries to reaquire lock
blocks until we have the lock again

unique_lock

void my_scope(mutex &mut, condition_variable_any &cv) {
    unique_lock<mutex> lck(mut);
    // do stuff, you can even pass it to a condition variable!
    cv.wait(lck);
}

Thread States and Contexts

Recall that threads are the unit of execution. The process control block keeps track of the *stack pointer* of the thread %rsp, which means if a thread is put to sleep the state can be stored somewhere on the stack.

Three states:

running (could switch to ready/blocked)
ready able to run, but not on CPU yet (could switch to running only)
blocked eating for something (could switch to ready/running)

trap

a trap is a user request for OS attention explicitly from the user thread, swapping the user process off the CPU.

system calls
errors
page fault (memory errors)

interrupt

a interrupt takes place outside the current thread, it forces the OS’ attention even if the user thread isn’t asking for it

character typed at keyboard
completion of a disk operations
a hardware timer that fires an interrupt

what if a timer goes off during an interrupt

interrupts are disabled during interrupt handling, otherwise, this causes an infinite loop.

preemption

We use interrupts to implement preemption, “preempting” threads in order to swap on another thread to CPU. This enables scheduling to happen.

// brand new thread

void interrupt_handler() {
    /* disables interupts, automatically by timer handler */

    // future spawns start here
    context_switch(...);

    /* enables interupts, automatically by timer handler */
}

void threadfunc_wrapper() {
    // manually enable interrupts before first run
    intr_enable(true);
    // start thread's actual business
    threadfunc();
}

Scheduling

main challenges

minimize time to a useful result—(assumption: a “useful result” = a thread blocking or completes)
using resources efficiently (keeping cores/disks busy)
fairness (multiple users / many jobs for one users)

We can measure 1) based on “average completion time”: tracking the average time elapsed for a particular queue based on the start of scheduling that queue to the time when each thread ends.

main designs

first-come first-serve

keep all threads in ready in a queue
run the first thread on the front until it finishes/it blocks for however long
repeat

Problem: a thread can run away with the entire system, accidentally, through infinite loops

round robin

keep all threads in a round robin
each thread can run for a set amount of time called a time slice (10ms or so)
if a thread terminates before that time, great; if a thread does not, we swap it off and put it to the end of the round robin

Problem: what’s a good time slice?

too small: the overhead of context switching is higher than the overhead of running the program
too large: threads can monopolize cores, can’t handle user input, etc.

Linux uses 4ms. Generally, you want 5-10ms range.

gold: shortest remaining processing time

Run first the thread in queue that will finish the most quickly and run it fully to competition.

It gives preference to those that need it the least (i.e. because it runs the smalest one); of course THIS IS not implementable without oracle time guess.

Our goal, then is to get as close as possible to the performance of SRPT.

Problem:

we don’t know which one will finish the most quickly
if we have many threads and one long-running thread, the long running thread won’t be able to run ever

priority based scheduling

Key idea: behavior tends to be consistent in a thread. We build multiple priority queues to address this.

priority based scheduling is an approximation of SRPT, using the past performance of the thread to estimate the running time of the thread. Over time, threads will move between priority queues, and we run the topmost thread from the highest priority queue

implement based on time slice usage

a thread always enters in the highest priority queue

if the thread uses all of its time slice and didn’t exit, bump them down a priority queue
if a thread blocked before it used all of its time slice, bump them up a priority queue

implement based on aggregate time used: fixing neglect

a thread has a number for “how much time did you use on the CPU recently”? The priories are sorted by that value, and the smallest time use will be ran.

context switch

(in asm) push all callee saved registers except %rsp into the bottom of the old thread’s stack
store the stack pointer %rsp into the process control block for that process corresponding to thread
read the new thread’s stack pointer from the process control block, and load that into %rsp
(in asm) pop all callee saved registers stored on the bottom of our new stack back onto the registers

To deal with new threads, we create a fake freeze frame on the stack for that new thread which looks like you are just about to call the thread function, and calls context_switch normally.

Virtual Memory

main challenges

multitasking: multiple processes should be able to use memory
transparency: no process need to know that memory is shared; each process should be able to run regardless of the number/locations of processes
isolation: processes shouldn’t be able to corrupt other processes’ memory
efficiency: shouldn’t be degraded by sharing

crappy designs with no DMT

single tasking: assume there’s one process 1) no isolation 2) no multitasking 3) bad fragmentation
load time relocation: move the entire program somewhere on load time 1) no isolation 2) can’t grow memory after load 3) external fragmentation after frees

main designs

base and bound

load time relocation + virtual memory

assign a location in physical memory, call the base; during translation, we just add every virtual address by the base
we can cap the virtual address space for each process by a bound, we can raise a bus error/segfault if it goes above the highest allowable

last possible address: is (bound - 1)+base

compare virtual address to bound, trap and raise if >= bound
then, return virtual address + base

tradeoffs: good - 1) inexpensive 2) doesn’t need more space 3) ritualized; bad - 1) can’t really move either (i.e. need to allocate) 2) fragmentation 3) no read only memory

multiple segments

break stack, heap, etc. into multiple segments; then do base and bound for each segment

tradeoffs: good - 1) you can now recycle segments 2) you can not map the middle 3) you can grow the heap (but not the stack, because it moves downwards); bad - 1) you need to decide segment size and location ahead of time

goal design

paging: fixed segment size, and just split each thing.

we map each page independently, and keep the offset. If a page is unused, internal fragmentation but not too bad. The stack can now grow downwards: because if it reaches into lower page numbers we can just map that page somewhere too.

For instance, typically page sizes are 4kb

Page Size	Offset Number Digits
4096 bytes (16^3)	3

then the rest of the address would just be the page number.

Intel’s implementation

Virtual Addresses


Unused (16 bits)	Virtual page number (36 bits)	Offset (12 bits)

Physical Addresses


Page number (40 bits)	Offset (12 bits)

translation

chop off page number and offset
translate the page number
concat the two together

implementation

Index	Physical Address	Writable	Present/Mapped?	Last Access	Kernel	Dirty
0	0x2023	1	0	0	0	0
1	0x0023	1	1	1	0	0

Swap

pick a page to kick out
write kicked page to disk
mark the old page entry as not present
give the physical address to the new virtual page

choosing what to swap

randomly! (works apparently kinda fine)
First-in-first out (fair, bust bad — throw out the page in memory longest; but what if its very used)
least recently used - clock algorithm

clock algorithm

rotate through all pages until we find one that hasn’t been referenced since last time

we add a reference bit to the page table—its set to \(1\) if the program wrote or read each page, otherwise its set to \(0\)
when page kick is needed, clock algorithm starts where it left off before and scan through physical pages
1. each page it checks with reference bit 1, it sets the reference bit as 0
2. if it checked a page and its reference bit is 0, we kick it out (because we’ve gone through two )

We now save the position of the hand—we want to begin checking with the page that hasn’t been checked for the longest time. If every page has a reference bit is one, running this algorithm doesn’t break because it would set its immediately next bit of memory.

page replacement

we don’t use per process replacement because we need to allocate max pages per process
we use global replacement to maximise usage

demand fetching

most modern OSes start with no pages loaded—load pages only when referenced; this is tempered by the type of page that’s needed:

Page Type	Need Content on First Load	Save to Swap (“Swap?”)
code	yes	no (read from exe)
data	yes	yes
stack/heap	no	yes

We only write to disk if its dirty.

Multicore + Flash

Scheduling Multi-Core CPUs

main approaches

one queue for everyone 1) need to figure out what is the priory of things on that queue (for preemption)
one queue per core: 1) where do we put a thread? 2) how do we move between cores?

One Ready Queue per Core

where do we put a given thread?
moving core between threads is expensive

Big tension:

Work Stealing: if one core is free (even if there is things in the ready queue), check other cores’ ready queues and try to do thread communism.
Core Affinity ideally, because moving threads between cores is expensive (need to rebuild cache), we keep each thread running on the same core.

Gang Scheduling

When you have a thread you are trying to schedule, try to see if there are other threads from the same process in the ready queue and schedule all of them on various cores.

Locking Multi-Core CPUs

disabling interrupts are not enough

hardware atomic operation exchange + busy waiting, which reads, returns, and swaps the value of some memory in a single atomic operation AND which is never ran in parallel; it returns the previous value of the memory before it was set:

class Lock {
    std::automic<int> sync(0);
}

void Lock::lock() {
    while (sync.exchange(1)) {}

    // we are now the only one using it
    // do work ....

    sync = 0;
}

The exchange function returns the old value.

Flash Storage

writing

You have two operation.

erase: You can set ALL SEGMENT of an “erase unit” to \(1\) (“erase unit” size is usually 256k)
write: You can modify one “page” at a time (which is smaller than a erase unit)—but you can ONLY set individual bits in the page into 0 (“page” size is usually 512 bytes or 4k bytes)

wear-out

wear leveling: make sure that the drive wears out at roughly the same rate as other parts of the drive. Moving commonly written data (“hot” data) around

FTL limitations

no hardware access (can’t optimize around flash storage)
sacrifices performances for performance
wasts capacity (to look like hard drive)
many layers

Ethics

trusting software is the task of extending your own AGENCY to a piece of software: “agential gullibility”.

pathways to trust

trust by assumption: 1) trust absent any clues to warrent it due to timing 2) trust because there is imminent danger
trust by inference: trust based on information you had before (brands, affiliation, performance)
trust by substitution: having a backup plan

accountability

accountability is in a chain

hardware designer (intel)
OS developer (iOS, ec.)
app developer
users

stakeholder

direct stakeholders (people who are operating, technicians, etc.)
indirect stakeholders: patients

purchase = long-term support —- what do you do to get it fixed/repaired.

scales of trust

scale of impact

a bug in an OS can be tremendously bad
“root access” — privileged aces

scale of longevity

people maybe on very very old OS
it requires keeping older OSes secure against modern technologies

SU-CS111 Final Sheet

FS

main challenges

seeks

internal v. external fragmentation

main designs

contiguous allocation

linked files

Windows FAT

File Payload Data

Unix V6 + MLI

block

ino

dir

file

open file table

Block Cache

Least Recently Used (LRU) Cache

Block Cache Modification

Crash Recovery

main challenges

fsck

limitations

ordered writes

limitations

journaling

what do we log?

structure

limitations and fixes

tradeoffs

MP

forking

the status int

fork mechanics

pipes

shell

MT

passing by reference

mutex

CV

unique_lock

Thread States and Contexts

trap

interrupt

what if a timer goes off during an interrupt

preemption

Scheduling

main challenges

main designs

first-come first-serve

round robin

gold: shortest remaining processing time

priority based scheduling

implement based on time slice usage

implement based on aggregate time used: fixing neglect

context switch

Virtual Memory

main challenges

crappy designs with no DMT

main designs

base and bound

multiple segments

goal design

Intel’s implementation

translation

implementation

Swap

choosing what to swap

clock algorithm

page replacement

demand fetching

Multicore + Flash

Scheduling Multi-Core CPUs

main approaches

One Ready Queue per Core

Gang Scheduling

Locking Multi-Core CPUs

Flash Storage

writing

wear-out

the `status` int