greedy algorithm

Make choices one at a time, never look back.

when does one greedy?

When problem exhibit a particularly nice optimal substructure: “at every step, when we make a choice, we do not rule out an optimal solution; when we get to the end, we got a solution, and thus it must be optimal.” After we made all of our choices, if we haven’t ruled out an optimal solution, we would have found an optimal solution.

  1. “suppose you are on track to make optimal solution \(T^{*}\)”
  2. “suppose \(T^{*}\) disagrees with your next greedy choice”
  3. “then manipulate \(T^{*}\) until it aligns with your greedy choice”

“When you solve your big problem, you only have to look at one previous subproblem.”

examples

activity selection

Suppose you can only do one activity at a time, and you want to maximize the number of activity that you do within a fix period of time, how does one choose?

You have:

  • activity \(a_1 … a_{n}\)
  • start times \(s_1 … s_{n}\)
  • finish times \(f_1 … f_{n}\)

We desire to maximize the number of activities to do in a \(k\) hour period.

bad attempts

  • shortest activity first?

    BAD: What if the shortest activity straddles the boundary between two?

  • fewest conflicts first?

    BAD: more conflicted activities may actually straddle and be optimal.

earliest ending time first

This works. You are basically just making sure you can “get everything done” as quickly possible, which frees up choice of the rest of the steps. This is \(O\qty(n\log n)\) since you just sort by finishing time.

proof

Suppose we chose some \(a_{i}\) and there is still an optional solution \(T^{*}\) that extends our choices. Our algorithm’s next choice \(a_{k}\) is still in \(T^{*}\) or is another optimal solution \(T^{*}’\).

If \(a_{k}\) is not in \(T^{*}\); let \(a_{j}\) be the activity in \(T^{*}\) with the smallest ending time. Recall that our algorithm choose \(a_{k}\) which has the smallest ending time; thus \(a_{j}\) ends later or equal \(a_{j}\). Thus \(a_{k}\) doesn’t conflict with anything chosen in \(T^{*}\) after \(a_{j}\), in particular allowing the same number of activities to still be chosen.

The algorithm above returns something optimal.

Base case: after we added no activities, there is an optimal solution extending that if one exists.

Inductive step: lemma above.

Conclusion:

  • after adding the last activity, there is an optional solution that extends the current solution
  • the current solution is the only solution that we discovered w.r.t. the current solution
  • and hence the current solution is indeed optimal

scheduling

  • \(n\) tasks
  • task \(i\) takes \(t\) hours
  • for every hour passes until task \(i\) is done, pay \(c_{i}\)

Do all cost and minimize time. Choose the biggest job with (cost of delay/time to do) ratio since the cost \(AB < BA\) if \(A\) has a smaller cost of delay / time to do ratio. For a particular optimal solution, if the next element doesn’t have this ratio, we can keep swapping the one with the optimal item forward and the cost will only decrease.

Huffman coding

Huffman Coding is Greedy

non-greedy case

Consider again unbounded knapsack. This is not something we can solve with greedy algorithms. The greedy algorithm would just grab the element with the most amount of value/weight ratio repeatedly.