## Preference Elicitation

For the Weight Method, for instance, we need to figure a \(w\) such that:

\begin{equation} f = w^{\top}\mqty[f_1 \\ \dots\\f_{N}] \end{equation}

where weight \(w \in \triangle_{N}\).

To do this, we essentially infer the weighting scheme by asking “do you like system \(a\) or system \(b\)”.

- first, we collect a series of design variables \((a_1, a_2, a_3 …)\) and \((b_1, b_2, b_3…)\) and we ask “which one do you like better”
- say our user WLOG chose \(b\) over \(a\)
- so we want to design a \(w\) such that \(w^{\top} a < w^{\top} b\)
- meaning, we solve for a \(w\) such that…

\begin{align} \min_{w}&\ \sum_{i=1}^{n} (a_{i}-b_{i})w^{\top} \\ \text{such that}&\ \bold{1}^{\top} w = 1 \\ &\ w \geq 0 \end{align}

**unlike the rest of everything, we are MAXIMIZING here** idk why

## Sampling Plans

Many methods requires knowing a series of samples of the objective value to calculate local model or population methods, so…

### Full Factorial

Grid it up.

- easy to implement
- good results
- bad: sample count grows exponentially with dimension

### Random Sampling

Use a pseudorandom generator to pick points in your space.

- allows for any number of evaluations you specify
- statistically,
**the points clump**when you do this! - also need lots of samples to get good coverage

### Uniform Projection

We take each point, and uniformly project it onto each dimension. To implement this, we grid up each dimension and shuffle the ordering of each dimension individually. Then, we read off the coordinates to create the points:

```
# in d3...
seq = range(axis_min, axis_max)
d1 = random.shuffle(seq)
d2 = random.shuffle(seq)
d3 = random.shuffle(seq)
sampling_points = zip(d1, d2, d3)
```

### Stratified Sampling

- perform Uniform Projection
- within each grid, make smaller grids and perform within them Uniform Projection again

## Space-Filling Metrics

### Pairwise Distances

This requires each set to have the **same number of points**

- figure the euclidian distance between every pair of points
- for each set of pairs, figure the closest together points, and call that the “pairwise distance” of the set

Limitation: if there are just two points that are close together, this metric scores it worse. So maybe Morris-Mitchell.

### Morris-Mitchell

We have a hype-parameter \(q\), which checks all of the possible norms to use between points. Consider \(d_{i}\) to be the ith-pairwise distance between the points with the Lp Norm for your choice of \(p\). Then, for:

\begin{equation} \Phi_{q}(X) = \qty(\sum_{i}^{}d_{i}^{-q})^{\frac{1}{q}} \end{equation}

and we try to solve for our set of points \(X\) such that:

\begin{equation} \min_{X} \max_{q \in \{1,2,5,10,20,50,100\}} \Phi_{q}(X) \end{equation}

“minimize the distance at the worst \(q\) possible norm”

### Space-Filling Subset

A Space-Filling Subset is a subset \(S\) of a point set \(X\) which **minimizes the maximum distance** between a point in \(S\) and its closest point in \(X\) (i.e. making \(S\) a good representative of \(X\)).

\begin{equation} d_{\max}(X,S) = \max_{x \in X} \min_{s \in S} |s -x|_{p} \end{equation}

we can choose any \(p\) norm you’d like.

#### greedy local search

Choosing one best point to add to \(S\) which maximize \(d_{\max}\), and then choose another point, and another one, …

#### exchange algorithm

randomly initialize \(S\), and swap points within \(S\) and only in \(X\)

## Surrogate Models

Once we finished sampling, we need to create a model parameterized by \(\theta\) which minimizes the error. In particular, we want to choose \(\theta\) such that:

\begin{equation} \min_{\theta} |y - \hat{y}|_{p} \end{equation}

for some model \(\hat{y}(x)\), with matching actual result \(y\).

### Linear Model

\begin{equation} \hat{f} = w_0 + \bold{w}^{\top}\bold{x} \end{equation}

whereby, we now want:

\begin{equation} \min_{\theta} |y - X \theta|_{2}^{2} \end{equation}

this is CLOSE FORM! by applying the pseudo-inverse:

\begin{equation} \theta = X^{\dagger} y \end{equation}

### Basis Functions

We can make this slighlty non-linear by computing some non-zero \(B(x)\) where a set of these basis functions all taking \(x\) as input (for instance, terms of a polynomial) \(\bold{B}(\bold{x})\) is then used for optimization:

\begin{equation} \min_{\theta} |y - B \theta|_{2}^{2} \end{equation}

#### Radial Basis Functions

\begin{equation} \psi(x,c) = \psi(|x - c|) \end{equation}

a radial basis function is a basis function that acts on the *distance* to a local point. You can choose any kernel \(\psi\) you’d like.

### Regularization

Especially for noisy things, you ideally want some kind of regularization: see Regularization

#### L2 Regularization

\begin{equation} \min_{\theta} || y - B(x)\theta ||_{2}^{2} + \lambda || \theta ||^{2}_{2} \end{equation}

this is kind of a like a multi-objective Weight Method.