Problem: because Object Calculus and Lambda Calculus really can’t be resolved in into one system, we have to decide either extending objects with lambda (Java, C++), or extend lambda with objects (OCaml, Haskell).
memory safety
This is a problem introduced during manual memory management: pointers or references needs to be checked whether they point to a thing of the correct type — key problem in C/C++ (double free, wild pointer, out of bounds accesses).
to ensure memory safety, we have three approaches:
- automatically via dynamic garbage collectors
- systematic but unenforced programming discipline (i.e. have system specific memory management best practices)
- have the static type system do it for you
garbage collection
key properties—-
- deallocation is done automatically: “objects that will never be used again are safe to deallocation”
- no pointer arithmetic allowed (grantees that one can’t make a pointer that GC doesn’t know about)—only references are returned (i.e. its a pointer without pointer arithmetic and has a known type)
- array indexing is always bounds checked
Pros: memory safe; Downside: array bounds checking is super expensive, GC is inefficient when the working set of memory is large, and GC has unpredictable delays.
some programming discipline
“ownership” tracking needs to be clear:
void my_func() {
int *ptr = (int*) malloc(sizeof (int));
*ptr = 42;
api_call(ptr);
}
void api_call(int *p);
after this, both my_func
and api_call
has access to ptr
. So, who’s responsible for deallocating this memory?
Solution: one pointer has to be designated to be the owner; meaning, it is the only pointer that could be used to deallocate the memory: you better have a unique owner.
Ownership rules for a given system is often documented as a part often comments: nothing in this system enforces particularly correct use. It’s always a possibility that people do it incorrectly.
aliasing
In the example above, p
and ptr
are both names for the int allocated up top; they are therefore aliases with each other.
During allocation, we need to know that either “no aliases exist” or “no aliases will be deferenced”
aliasing control
void copy(char *x, char *y);
so what happens during copy(x,x)
? One convention is to write:
void copy(restrict char *x, restrict char *y);
which says “we promise that x
and y
can’t be aliased to anything else in scope”.
- aliasing is bad
- state can be modified in one name and those changes are visible through a different name
- aliasing is really common in real programs—OOP code is particular bad at this and may generate aliasing
restrict mutation
Maybe aliasing isn’t actually the problem…. Really, we want to disallow mutation. (Problems arise when aliases and mutation co-occur: because we can change pointer values under us)
Much of the computation problems have quite efficient algorithms without mutation; yet some things like array update is O(1)
for all cases except for functional-type languages, where O(log N)
is the best known functional update (certain matrix algorithms may not be possible).
let x = 5;
let mut x = 5;
x = 3; // only possible when x is mut
let x = 5;
let x = ref 5; (* now mutable! *)
x := 3
i.e.: if you want something to change, you have to do something special—make these points obvious in syntax, make the program looks ugly when mutable.
ownership types
That is, baking the idea of ownership into the typ esystem.
- there’s always a single owner referencing an object (owning: responsible for the resources of the object)
- if an object has no owner (when the owner goes out of scope), it will be deallocated
x=y
will remove ownership fromy
and transfers it tox
(we can’t even usey
after such a copy)
Typical ownership rules are very restrictive because the program must be linear. So, we use the following techniques:
- using immutabel data structures when possible
- deep copies are fine
- borrowing creates a reference that can be used
- doesn’t transfer ownership
- implies a borrowed reference can’t deallocate an object
- owner can’t deallocate objects until all borrowed refrences are returned
- borrowed references have a different syntax and type
lifetime
Rust reasons about ownership through lifetimes: the lifetime of a variable is the span between its definition and its last use—a lifetime is a subset of the scope (i.e. if you stopped using a variable earlier than it going of scope).
Unique owner: lifetime of owners of an object cannot overlap.
This is called a linear type discipline: you will never have an aliasing problem (beaches only one name is available at any time for an object).
If there were not borrowing in rust, it would be purely linear; that also makes writing non-trivial programs impossible because you can’t write things like iterators (because it needs a pointer to the root and a pointer to the current value).
borrow
“borrowing” is aliases in rust with tighter semantics. You can’t call free on a borrow.
- borrow cannot outlive its owner: the lifetime of a borrow is contained within the lifetime of its owner
- a borrow can’t deallocate its object—that’s why its a borrow
- there can only be one mutable borrow to an object in scope, and any number of readers
explicit lifetimes
Suppose we want to return the longer of two strings:
fn longest(x: &str, y: &str) -> &str;
x
and y
may not have the same lifetime, and we don’t know that the lifetime of the result could be.
This requires some discussion of if statement:
\begin{equation} \frac{A \vdash e_1: Bool, A \vdash e_2: T, A \vdash e_3: T}{A \vdash \text{$e_1$ then $e_2$ else $e_3$}:T} \end{equation}
that is, both branches of an if
statement must return the same type, otherwise it has n way to type check the output type.
An ownership type, then, requires that the lifetime of these two branches to be the same. We will therefore take in lifetime information:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str;