Beginner's Guide to Greedy

This blog post is a submission for the Codeforces Month of Blog Posts Pt. III challenge. Thank you cadmiumky for the initiative!

This is how I wish I had been introduced to Greedy. Learning how to prove is the truly valuable skill, one that I believe good Greedy problems test rigorously.

Introduction

After struggling with proofs for quite a while, there are a few things I realized. Greedy proofs are very dependent on the rules of the problem. Loosely speaking each optimization problem gives you observations, from those observations, you realize that if you take certain choices while avoiding all others, then it will always be optimal. That realization is termed Greedy.

There are a lot of optimization problems, but only a small subset allows you to break the problem into smaller independent subproblems where solving them optimally leads to the best global answer. This property, known as Optimal Substructure, is found in both DP and Greedy algorithms. We won't go into those details here, instead we will generally assume our problems have the Optimal Substructure property.

Short Note

While DP and Greedy share the optimal substructure property, DP is essentially "smart brute force", where you define the subproblems and check all transitions to find the best one. In Greedy, however, you have to rigorously prove that ignoring all other choices and taking just one specific path works every single time. This takes the difficulty one notch up from DP.

The idea behind DP is the same everywhere, gather enough observations to define your subproblems, transitions, and base cases. This same idea is explored in a lot of diverse places—there is DP on Bitmasks, Trees, Graphs, Digits, Ranges, Weird DP Sorcery? and On and on.., but there are not many discussions on Greedy. To fill this gap, I would like to discuss a lot of problems and proofs to capture the essence of how to prove Greedy solutions.

Idea

To prove a greedy solution, the arguments are almost always Proof by Contradiction or Induction. The argument goes like this: let's assume a greedy solution $$$G$$$ and a magical optimal solution $$$O$$$ that doesn't follow the greedy choices. Then, given our observations, we prove that either $$$G$$$ performs no worse than $$$O$$$, or that given the constraints, the final solution produced by both strategies will look exactly the same. On the flip side, among many greedy choices, to disprove one, the easiest way is to find a counter-example.

Below, I have listed links to various problems and discussed the greedy ideas they use, along with some common greedy examples. Sometimes Greedy seems obvious and intuitive, but I encourage you to try forming a rigorous argument while proving the problems mentioned below.

Problems

I. Basic problems (without links)

Try to come up with formal arguments to get the hang of it. Basic doesn't always mean easy.

P1. I have $$$n$$$ items and a knapsack of capacity $$$W$$$. The $$$i$$$-th item has value $$$v_i$$$ and weight $$$w_i$$$. I am allowed to take fractions of items. Find the strategy that maximizes the total value.

Hint

Solution

P2.I am running along an infinitely long number line. I can run for at most $$$M$$$ units without refreshments. There are $$$n$$$ refreshment stalls, where the $$$i$$$-th stall is located at $$$x_i$$$. Find the strategy that minimizes the number of refreshment breaks. There is a target $$$T$$$ located somewhere on the number line. All stalls are placed such that it is always possible to reach the target.

Hint

Solution

P3. Given two integer arrays $$$a$$$ and $$$b$$$ of size $$$n$$$ filled with positive integers, reorder the elements within the arrays to maximize $$$\prod_{i=1}^{n} a_{i}^{b_i}$$$. Too easy :P

Hint

Solution

P4. Given an array of $$$n$$$ activities, where the $$$i$$$-th activity takes $$$p_i$$$ time to complete. I want to schedule activities such that they don't overlap.

Let $$$c_i$$$ be the completion time of the $$$i$$$-th activity in the schedule. If the $$$i$$$-th activity starts at time $$$x$$$, then its completion time is $$$c_i = x + p_i$$$. Goal is to minimize $$$\sum_{i=1}^{n} c_i$$$ (which is equivalent to minimizing the average completion time).

Hint

Solution

Each activity has a start time $$$s_i$$$ and a finish time $$$f_i$$$. Find the maximum number of non-overlapping activities you can perform.

Hint

Solution

Let $$$O= \begin{Bmatrix} o_1,o_2, \cdots , o_m \end{Bmatrix}$$$ be an optimal set of non-overlapping activities, sorted by their finishing times.

Claim: Let $$$g_1$$$ be the activity with the absolute earliest finish time in the entire dataset. There exists an optimal solution $$$O'$$$ that contains $$$g_1$$$.

Proof: By Exchange Argument. Its actually easy to see. Look at the first activity in our optimal set, $$$o_1$$$. If $$$o_1 = g_1$$$, the claim is true. If $$$o_1 \neq g_1$$$, we know by definition that $$$finish(g_1) \le finish(o_1)$$$. Since $$$o_1$$$ finishes before $$$o_2$$$ starts, and $$$g_1$$$ finishes even earlier, $$$g_1$$$ is also compatible with the rest of the schedule ($$$o_2, \dots, o_m$$$). We can replace $$$o_1$$$ with $$$g_1$$$ to create a new set $$$O' = \begin{Bmatrix} g_1, o_2, \dots, o_m \end{Bmatrix}$$$.

This set is valid and has the same size as $$$O$$$. Thus, $$$O'$$$ is also optimal.

Rest of the proof is exactly is same as P2 .

Let the Greedy solution be $$$G = \begin{Bmatrix} g_1, g_2, \dots, g_k \end{Bmatrix}$$$, where each element is picked using the "earliest finishing time" strategy. Since $$$O$$$ is the maximum possible size, we know $$$k \leq m$$$. We want to prove $$$k = m$$$.

Claim: $$$finish(g_i) \leq finish(o_i)$$$ for all $$$1 \le i \le k$$$.
Proof: By Induction,

Base Case: $$$finish(g_1) \leq finish(o_1)$$$ is true by the definition of the Greedy choice (it picks the global minimum finish time).
Hypothesis: Assume the claim holds for step $$$i$$$: $$$finish(g_i) \leq finish(o_i)$$$.
Inductive Step: We want to prove $$$finish(g_{i+1}) \leq finish(o_{i+1})$$$. Since $$$O$$$ is a valid schedule, $$$start(o_{i+1}) \ge finish(o_i) \ge finish(g_i) $$$. This implies that from all compatible activities, $$$o_{i+1}$$$ is one of the candidate. Since Greedy always picks the earliest finishing time, $$$g_{i+1}$$$'s finishing time will be no further than $$$o_{i+1}$$$'s finishing time.

Since the Optimal schedule ends at $$$o_m$$$, and Greedy ends at (or before) that time ($$$finish(g_m) \leq finish(o_m)$$$), Greedy leaves space for at least as many subsequent ranges. Thus, Greedy finds at least as many activities as Optimal ($$$k \ge m$$$).But since $$$O$$$ is optimal, we must have $$$k = m$$$.

Note: The graph is denoted by $$$H$$$, as the letter $$$G$$$ represents the Greedy solution.

By now, we must have gotten some idea about how to find properties that connect the Greedy solution $$$G$$$ and the Optimal solution $$$O$$$, eventually proving they are the same.The same logic applies to problems involving graphs or trees. We often assume the existence of a specific optimal tree $$$T_{O}$$$, a subgraph $$$H_{O} \subset H$$$, or even a partially constructed subtree $$$T_{O}' \subset T_{O} \subset H$$$ , basically anything concrete that we can "play with" to formulate rigorous arguments about their behavior.

P5. Given an undirected graph $$$H=(V,E)$$$ with non-negative weights.

Find its Minimum Spanning Tree (MST).

Hint

Solution

Find a strategy to find the Second Best Minimum Spanning Tree.

Hint

Solution

Bonus

II. Basic Problems (with Links)

Random Thought

Outro

I hope this blog helps both beginners and intermediate problem solvers.

These are just a few of the interesting Greedy problems I've shared. If you know of other problems with elegant or tricky Greedy arguments, please share them! Also, if you spot any mistakes or if any part of the explanation wasn't clear, don't hesitate to point it out.

Happy Solving! ^_^

Update 1: Clarified the wording of P2.

Comments (9)

Write comment?

dominique38

3 months ago, hide # |

Auto comment: topic has been updated by dominique38 (previous revision, new revision, compare).

→ Reply

commandox

Brilliant way to think about greedy — Thank You So Much

3 months ago, hide # ^ |

Glad you liked it

mowo

Hi I loved this blog, it was very useful for me! I also don't feel comfortable until I have a concrete enough proof, but I have no formal math training, so your basic examples were very helpful, I spent the whole evening proving through all of them lol.

I will certainly be working through your list of recommended greedy problems.

Some suggestions, I think the wording of the refreshment break run problem can be clarified. It says infinitely long line, so initially I thought we were trying to run forever. Then I thought the best answer was just to run for a distance of 0 and take no refreshments lol. Maybe specify that you have a fixed starting point, and must run to a fixed ending point, and refreshment stops are put in between in a way that guarantees it is possible to get from start to end.

Also here is a greedy/observation problem i think is cool: https://mirror.codeforces.com/contest/2122/problem/C

why i think is cool (spoilers)

Glad you liked it. Updated the wording in P2.

i-love-ayase-momo

https://oj.uz/problem/view/JOI18_candies this problem is mind blowing, but its pretty hard

DarkDevilVaqif

Could I also suggest this greedy problem, so many exchange arguments in the editorial: https://mirror.codeforces.com/contest/2164/problem/C

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	dXqwq	3436
8	Radewoosh	3415
9	Otomachi_Una	3413
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	164
2	adamant	150
3	Um_nik	146
4	Dominater069	144
5	errorgorn	141
6	cry	139
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
9	TheScrasse	134

dominique38's blog

Introduction

Idea

Problems

I. Basic problems (without links)

II. Basic Problems (with Links)

Outro