Introduction
I personally find greedy problems hard to derive. After struggling with proofs for quite a while, there are a few things I realized. Greedy proofs are very dependent on the rules of the problem. Loosely speaking each optimization problem gives you observations, from those observations, you realize that if you take certain choices while avoiding all others, then it will always be optimal. That realization is termed Greedy.
There are a lot of optimization problems, but only a small subset allows you to break the problem into subproblems where solving them optimally leads to the best global answer. This property, known as Optimal Substructure, is found in both DP and Greedy algorithms. Sadly, not every problem has this, for example, I CANT FIND ANY EXAMPLES. Maybe say we have an array that constantly shuffles around between subsequent queries, and there exists and element $$$\lceil \frac{n}{3} \rceil $$$ times and $$$query(x)$$$ returns number of values $$$x$$$ in the array. And we need to find that element, maybe only good way to do it sample it and probabilistically we will get the answer with in few dozen queries.
While DP and Greedy share the optimal substructure property, DP is essentially "smart brute force", where you define the subproblems and check all transitions to find the best one. In Greedy, however, you have to rigorously prove that ignoring all other choices and taking just one specific path works every single time. This takes the difficulty one notch up from DP.
The idea behind DP is the same everywhere, gather enough observations to define your subproblems, transitions, and base cases. This same idea is explored in a lot of diverse places—there is DP on bitmasks, trees, digits, ranges and On and on.., but there are not many discussions on Greedy. To fill this gap, I would like to discuss a lot of problems and proofs to capture the essence of how to prove Greedy solutions.
Idea
To prove a greedy solution, the arguments are almost always Proof by Contradiction or Induction. The argument goes like this: let's assume a greedy solution $$$G$$$ and a magical optimal solution $$$O$$$ that doesn't follow the greedy choices. Then, given our observations, we prove that either $$$G$$$ performs no worse than $$$O$$$, or that given the constraints, the final solution produced by both strategies will look exactly the same. On the flip side, among many greedy choices, to disprove one, the easiest way is to find a counter-example.
Below, I have listed links to various problems and discussed the greedy ideas they use, along with some common greedy examples. Sometimes Greedy seems obvious and intuitive, but I encourage you to try forming a rigorous argument while proving the problems mentioned below.
Problems
Basic problems
Try to come up with formal arguments to get the hang of it. Basic doesn't always mean easy.
P1. I have $$$n$$$ items and a knapsack of capacity $$$W$$$. The $$$i$$$-th item has value $$$v_i$$$ and weight $$$w_i$$$. I am allowed to take fractions of items. Find the strategy that maximizes the total value.
HintImagine a scenario where all items have unit weights $$$w = \begin{Bmatrix} 1,1,1,\cdots,1 \end{Bmatrix} $$$ and but arbitrary value array $$$v$$$ .
Similarly, what if we have all unit values $$$v = \begin{Bmatrix} 1,1,\cdots,1 \end{Bmatrix} $$$ and arbitrary weights.
How should we fill the knapsack in these cases? Can we convert the original problem to look like this?
SolutionSince we are allowed to take fractional amounts, we can normalize the problem. Following the hints, let's divide every value $$$v_i$$$ by its corresponding weight $$$w_i$$$ to get the value density (value per unit weight). Now, the problem transforms into selecting "units" of weight, where each unit from item $$$i$$$ contributes $$$v_{i \text{ new}}=v_i/w_i$$$ to the total score, and contribution of each can be upto $$$w_i$$$.
Let $$$G$$$ be the Greedy strategy: pick the item with the highest density first, taking as much as possible, then the next highest, and so on, say its $$$v_n, v_{n-1}, \cdots , v_m $$$, in this strategy, at most one item (the last one added) will have a partial contribution (weight $$$ \leq w_m$$$) .
Let $$$O$$$ be the optimal set of chosen items = $$$ \begin{Bmatrix} (v_i,u_i) \mid u_i \leq w_i \end{Bmatrix}$$$, where $$$u_i$$$ is the contribution taken from item $$$i$$$. We need to prove that $$$O$$$ is same as $$$G$$$.
Claim: Any value $$$v_i \notin O$$$ must be less than every value in $$$O$$$.
Proof: By Contradiction ( Exchange argument ).
If there was such value such that $$$v_j \in O$$$ with contribution $$$u_j$$$, such that $$$v_j \lt v_i $$$ then replacing $$$v_j$$$ by any amount of contribution from $$$v_i$$$ would increase the final total value, say we remove a small amount of weight $$$\epsilon$$$ from item $$$j$$$ and replace it with amount $$$\epsilon$$$ of item $$$i$$$.
Extra space used is $$$-\epsilon + \epsilon = 0$$$ (Knapsack is still valid) and the total value change is $$$ -\epsilon v_j + \epsilon v_i \gt 0$$$, the total value increases.
This contradicts the assumption that $$$O$$$ was optimal. Therefore, the Optimal solution must prioritize items with higher density, filling them completely before touching lower-density items. This is exactly the Greedy strategy $$$G$$$
P2.I am running along an infinitely long number line. I can run for at most $$$M$$$ units without refreshments. There are $$$n$$$ refreshment stalls, where the $$$i$$$-th stall is located at $$$x_i$$$. Find the strategy that minimizes the number of refreshment breaks.
HintLet Greedy strategy $$$G$$$ = $$$g_1, g_2, \cdots, g_k$$$, where we simply choose the furthest reachable stall, if I am at $$$g_i$$$ then the next stop $$$g_{i+1}$$$ is the largest value such that $$$g_{i+1} \leq g_i + M$$$ . Let Optimal strategy $$$O$$$ = $$$o_1, o_2, \cdots o_m$$$,
What inequality between $$$g_1$$$ and $$$o_1$$$ can we guarantee? Who is further ahead?
Can we generalize this for any step $$$i$$$?
SolutionLet $$$k$$$ be the number of stops for Greedy, and $$$m$$$ be the number of stops for the Optimal solution. Since Optimal minimizes stops, we know $$$m \leq k$$$. By definition, $$$ o_1 \leq g_1 $$$.
Claim: $$$o_i \leq g_i $$$ for all $$$i \leq k$$$ .
Proof: By Induction.
Base case: From the start, Greedy picks the furthest reachable stall. Optimal picks some reachable stall. Therefore, $$$g_1 \geq o_1$$$ is trivially true.
Inductive Hypothesis: $$$o_i \leq g_i $$$, is true till $$$i$$$-th step .
From our hypothesis, $$$g_i \geq o_i$$$, which implies $$$g_i + M \geq o_i + M \geq o_{i+1}$$$.This means $$$o_{i+1}$$$ is a reachable stall from the current Greedy position $$$g_i$$$. By definition, Greedy picks the furthest reachable stall. Thus, $$$g_{i+1}$$$ must be at least as far as $$$o_{i+1}$$$.
By the time Optimal finishes (at step $$$m$$$), it has reached the destination ($$$o_m \geq \text{Target}$$$). Since $$$g_m \geq o_m$$$, Greedy has also reached the destination by step $$$m$$$. This implies Greedy needs no more steps than Optimal, so $$$k \leq m$$$. Since $$$m$$$ is the minimum possible, we must have $$$k = m$$$.
P3. Given 2 integer arrays $$$a$$$ and $$$b$$$ of size $$$n$$$ filled with positive integers. Reorder arrays to maximize $$$\prod_{i=1}^{n} a_{i}^{b_i}$$$. Too easy :P
HintWhat if problem was $$$ \sum_{i=1}^{n} a_i \cdot b_i $$$ .
What if problem was $$$ \sum_{i=1}^{n} lg(a_i) \cdot b_i $$$ .
What if problem was $$$ \sum_{i=1}^{n} f(a_i) \cdot g(b_i) $$$ . $$$f$$$ is monotonically increasing function and $$$g$$$ is monotonically decreasing function.
Will the argument remains the same ?
SolutionLets sort array $$$a$$$ and reorder the respective elements in $$$b$$$. Strategy $$$G$$$ be, sort $$$b$$$ such that high values of $$$a$$$ gets high value of $$$b$$$. Let optimal sequence be $$$O$$$.
Claim: Sequence $$$O$$$ must not contain any inversion.
Proof: By Contradiction ( Exchange argument ).
If there was inversion, say at $$$i \lt j$$$ and $$$b_i \gt b_j$$$ . Currently final product looks like $$$ = \cdots a_{i}^{b_i} \cdots a_{j}^{b_j} \cdots $$$ and after swapping $$$b_i$$$ and $$$b_j$$$ product will be $$$ = \cdots a_{i}^{b_j} \cdots a_{j}^{b_i} \cdots $$$. If we divide the products Before upon After $$$ = a_{i}^{b_i - b_j} / a_{j}^{b_i - b_j} = \left( a_i / a_j \right)^{b_i - b_j}$$$ . This fraction is greater than $$$1$$$ and the integer in power is positive integer. Hence correcting every inversion will increase the value. Hence $$$O$$$ must not contain any inversion.
Not having any inversion is the sufficient condition to make $$$O$$$ sequence look exactly like $$$G$$$. Hence $$$G$$$ is as optimal as $$$O$$$.
P4. Given array of $$$n$$$ activities, $$$i$$$th activity takes $$$p_i$$$ time to complete. I want to schedule activities such activities dont overlap,
- Let $$$c_i$$$ be the completion time of the activities. Say $$$i$$$th activity starts at $$$x$$$th time, then its completion time is $$$c_i=x+p_i$$$. Minimize the $$$ \cdot \sum_{i=1}^{n} c_i$$$ , or divide by $$$n$$$ to call it minimum average completion time.
HintConvert this problem to P3.
SolutionIn every permutation of activities, the first activity's time will be summed $$$n$$$ times, second activity's time will be summed $$$n-1$$$ times, and so on. So Its just like P3. with $$$a$$$ as $$$p$$$ time array, and $$$b = [1,2,3, \cdots n]$$$ array.
- Given another array $$$s$$$ that tells the starting time of the activities. Find the maximum set of activities you take.
HintLet the Bold be the set $$$O$$$, the set of optimal activities. And dotted are the remaining activities.
Which $$$b_i$$$ and $$$a_j$$$ i can i safely swap?
1. We can replace $$$b_2,b_4$$$ and $$$b_5$$$, in place of $$$a_2,a_3$$$ and $$$a_4$$$ ? maybe the least overlapping, like $$$b_2$$$ ? or the shortest interval because of $$$b_5$$$? Can i do it always ?
2. We can replace $$$b_1$$$ with $$$a_1$$$, or $$$b_6$$$ with $$$a_7$$$ ? Can i do it always ?
Greedy strategy should be something thats reliable always.

Hint - To test the shortest duration hypothesis lets draw samples of only 2 types of length, if it dont work for 2 different types of length then it will surely not work for $$$ \gt 2$$$ types of length, say this

and we can easily see, shortest duration is not good strategy.
- To test, the least overlapping hypothesis lets draw a lot of samples with same length, maybe this

here we can also see, least overlapping don't work.
SolutionLets have an optimal set $$$O= \begin{Bmatrix} o_1,o_2, \cdots , o_m \end{Bmatrix}$$$. In the order of finishing times. $$$O$$$ contains non-overlapping ranges.
Claim: If there exist an activity $$$a_i \notin O$$$ and $$$a_{i}.\text{finish} \leq o_{1}.\text{finish}$$$ . Then there exist some optimal set $$$O'$$$ which contains $$$a_i$$$.
Proof: By Exchange Argument, its actually easy to see, if i replace $$$o_1$$$ with $$$a_i$$$, then $$$a_{i}.\text{finish} \leq o_{2}.\text{start}$$$ it wont be an overlapping, and hence a valid set. And its size did not decrease, hence a new optimal valid set with at least 1 different element.
Rest of the proof is exactly is same as P2 .
Let the greedy solution is $$$G= \begin{Bmatrix} g_1,g_2, \cdots , g_k \end{Bmatrix} $$$, where each element is picked with earliest finishing time strategy. Since optimal solution size if $$$m$$$, so $$$k$$$ must be $$$k \leq m$$$.
Claim: $$$g_i.\text{finish} \leq o_i.\text{finish}$$$ , for all $$$i \leq k$$$.
Proof: By Induction,
Base case is $$$g_1.\text{finish} \leq o_1.\text{finish}$$$ , by definition. Hypothesis is that its stays true till $$$i$$$-th step. Lets assume its not true for the $$$i+1$$$-th step, and $$$o_{i+1}.\text{finish} \lt g_{i+1}.\text{finish}$$$, then by the definition of $$$G$$$ $$$o_{i+1}$$$, would have been picked, but its not the case, hence our original assumption is false.
Using above we can see than $$$o_m$$$ is the final step, and $$$g_m$$$-th finishes along side $$$o_m$$$ or before, which means it leaves space more space towards the upcoming activities, so $$$m \leq k$$$, which means $$$G$$$ is as optimal as any other optimal strategy.
Note: Graph is denoted with $$$H$$$ , as the letter $$$G$$$ represents greedy solution.
By now, we must have gotten some idea about how we try to find properties that connect the greedy solution $$$G$$$ and the optimal solution $$$O$$$ to make them look the same. The same applies to problems involving graphs or trees, where we assume a certain optimal tree $$$T_{O}$$$, or some subgraph $$$H_{O} \subset H$$$, or some partially constructed subtree $$$T_{O}' \subset T_{O} \subset H$$$ basically anything concrete that we can play with and try to form some concrete arguments about their behavior.
P5. Given a undirected graph $$$H=(V,E)$$$ with non-negative weights.
- Find its Minimum spanning tree.
HintLet $$$T$$$ be the minimum spanning tree. $$$S$$$ be the some arbitrary set of nodes and $$$A$$$ be the set of edges denoted in blue belongs to $$$T$$$. $$$S$$$ is made such that edges from $$$ S $$$ to $$$V \setminus S$$$ are not part of $$$A$$$, hence not denoted in blue.
Will any of the $$$u_i$$$ edge be the part of $$$T$$$ ?
If yes, then which $$$u_i$$$ edge should i pick ? we had drawn boundary around $$$A$$$ because in bucket of choices of edges we don't want edges that are already in $$$T$$$.

HintWill any of the $$$u_i$$$ edge be the part of $$$T$$$ ? Yes, otherwise final tree would be disconnected.
If yes, then which $$$u_i$$$ edge should i pick ? Though there can many crossed edge present in the final MST, can we guarantee that the minimum one will always be present in some MST ? . Lets say minimum edge is $$$u_3$$$ with vertices $$$(x,y)$$$.
Claim: Yes we can guarantee.
Proof: By Construction ( Exchange argument ).
Lets say we have some MST $$$T'$$$ that dont contain edge between $$$(x,y)$$$, instead say it contains edge between $$$(u,v)$$$. If we connect $$$(x,y)$$$ it will form a graph with a cycle, and cycle contains $$$(u,v)$$$ and $$$(x,y)$$$ both, because these 2 edges are among the edges that connects $$$S$$$ with $$$V \setminus S$$$. Then removing edge from this cycle will result in tree, and if i remove $$$(u,v)$$$ then cost of this new tree wont increase, hence resulting tree is a valid MST.

Im not saying $$$T'$$$ is not a MST, im saying there exists a valid MST with $$$(x,y)$$$ being one of the its edge.
SolutionNow solution is simple. We have $$$A$$$ a set of edges that are present in MST. We know a way to add new edges to this set $$$A$$$. We should keep adding edges until we form a single tree. If you have noticed carefully what we just did was Proof by Induction. Just like we did in previous problems, instead of array now we have a graph $$$H$$$.
Base Case: $$$A = \phi$$$, we will take no edges as zero weight tree, which is lowest weight possible with non-negative edges. Hence its trivially true.
Hypothesis: We have assumed $$$A$$$ is part of edges present in MST. And by above hint we have proved, we can grow $$$A \cup (x,y)$$$ with minimum choice. Hence cost of the tree made by Greedy wont be worse than MST made by any other algorithm.
Practically you can either grow, by making $$$S = A$$$ every time, since at first $$$A = \phi$$$, we can make any partition and add a single edge into it, and continue like this 
Or, since we know we are picking minimum edges all the time, so we can sort them, and try to add into $$$A$$$ one by one. In this case at every iteration, say i have an edge $$$(x,y)$$$, if $$$x$$$ and $$$y$$$ are already connected through some path from edges in $$$A$$$, then i cannot form a partition $$$S$$$. Otherwise, if MST is not completely built, then there are still some disconnected parts not joined by edges from $$$A$$$, so partition is possible you just need to keep looking. At the end through induction we are guaranteed to have minimum spanning tree.
Do these 2 algorithms sounds familiar ? 
- Find strategy to find the 2nd best Minimum spanning tree.
HintSecond best MST, differ in say how many edges from The best MST ? $$$1$$$ , or $$$2$$$ ?
Solution It should contain 1 different edge from the best MST, because if there were 2 bad edges, then we can always form a cut to remove one of them make the cost better than previous but still worse than best.
Just try with brute force all the edges not in $$$T$$$, in each try you will add edge and it will form a loop we we remove the max edge possible from $$$T$$$ to form the minimum cost possible tree in this situation. Answer will be the minimum of such costs.
BonusThese are the exact same steps to prove the Dijkstra as well. Can you form the argument right ?
Base Case, Hypothesis, and Inductive Step. Similar conclusion is reach in Rethink the Dijkstra algorithm -- Let's go deeper blog, where its realized that greedy argument can be used in creative ways.
Ideas used in above
In my opinion, after struggling and failing to formulate a satisfying argument, it feels very weird to hear people in editorials and comments say, "Oh, that's intuitive... it's obvious... it's trivial." I'm left wondering, "How come?" usually, their reply is just Cpp code translated into English x_x
- Kanade's Perfect Multiples,
- Game on Array,
- Stay or Mirror,
- Ticket Hoarding,
- Maximum Running time of N computers,
- Group Increases,
- Xor factorization,
- Divine Tree,
- Tracks in the Snow
- Find the k-sum of an array
- Erasing Vertices 2
- Path and Subsequence