Quadratic binary energy function minimisations by Flow

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

№	Пользователь	Вклад
1	Qingyu	157
2	adamant	153
3	Um_nik	147
4	Proof_by_QED	146
5	Dominater069	145
6	errorgorn	142
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

Motivations

I tried to read https://mirror.codeforces.com/blog/entry/148438 and failed. It contains a lot of extra explanations and intuition that I either already knew or could reconstruct immediately. That is not inherently bad, but for me personally it made the key ideas harder to see. (I think most contestants would not struggle with visualising binary logical relations...)

As I asked around (and with help from LLMs), it became clear that the underlying structure is the Kolmogorov–Zabih characterization of which quadratic binary energies are solvable via a single max-flow.

This blog is essentially a reweighting of that material: shorter, more “algebraic”, and aimed at people who are already comfortable with max-flow / min-cut and with abstracting away implementation details. This blog is also just essentially an extremely long comment that is too long to fit in comment section.

Declarations

While writing this, I did not:

Study the original Kolmogorov–Zabih proof in detail beyond a surface understanding: https://www.robots.ox.ac.uk//michaelmas2002/23520065.pdf.
Solve a large number of max-flow problems to add my own problem-specific heuristics or experiences. For that kind of content, see errogorn’s blog.

This post is basically https://mirror.codeforces.com/blog/entry/148438, but with a different emphasis and a more compact presentation, which I think is more convenient for higher-rated programmers or anyone already familiar with the general method. In particular, none of the content here is original.

I have only decided what content to include. The styling and wordings are completely upto chatgpt 5.1 pro. Please don't hate me if the whole article sounded too mechanical and too LLM. I can only assure you that I put thought into the content and organization, and I won't comment on the rest.

Also, I am sorry for the fancy name that is ultimately something pretty well known.

Notation

I mix mathematical and programming notation.

(a == b) is 1 if a equals b, and 0 otherwise. In math, this is the Kronecker delta $$$\delta_{ab}$$$.
Logical symbols:
!x is written as $$$\neg x$$$.
a && b is written as $$$a \land b$$$.
a || b is written as $$$a \lor b$$$.

All variables $$$x_i$$$ are binary: $$$x_i \in {0,1}$$$.

Main result

We start from the general quadratic binary energy minimization problem.

Problem 1

You are given $$$n$$$ binary variables $$$x_i$$$ and a cost function

$$$ E(x) = \sum_{i=0}^{n-1}\sum_{t=0}^{1} (x_i == t) A_{t,i}

+ \sum_{i=0}^{n-1}\sum_{j=i+1}^{n-1}\sum_{s=0}^{1}\sum_{t=0}^{1} (x_i == s)(x_j == t) B_{s,t,i,j}, $$$

where $$$A_{t,i}$$$ and $$$B_{s,t,i,j}$$$ are arbitrary real numbers.

Problem: minimize $$$E(x)$$$ over all $$$x \in {0,1}^n$$$.

This is essentially the most general pairwise energy you can place on binary variables where each term depends on at most two variables. If you think of the $$$x_i$$$ as algebraic variables with the constraint $$$x_i^2 = x_i$$$, this is exactly the most general quadratic polynomial in $$$x_0,\dots,x_{n-1}$$$.

Interpretation of parameters:

Whenever $$$x_i = 0$$$, we add a cost $$$A_{0,i}$$$.
Whenever $$$x_i = 1$$$, we add a cost $$$A_{1,i}$$$.
Whenever $$$x_i = s$$$ and $$$x_j = t$$$, we add a cost $$$B_{s,t,i,j}$$$.

We restrict to $$$i \lt j$$$ because the pairwise term $$$B_{s,t,i,j}$$$ for $$$(i,j)$$$ and the term $$$B_{t,s,j,i}$$$ for $$$(j,i)$$$ have the same effect.

Not every such problem is tractable. It is well known that minimizing the number of unsatisfied clauses in 2-SAT (i.e., finding a minimum-cost assignment) is NP-complete, and Problem 1 is strictly more general.

However, there is a substantial subclass of these problems that can be solved via a single max-flow.

Lemma 2

The following special case of Problem 1 can be solved directly by a single max-flow.

Minimize

$$$ E(x) = \sum_{i=0}^{n-1}\sum_{t=0}^{1} (x_i == t) A_{t,i}

+ \sum_{i=0}^{n-1}\sum_{j=i+1}^{n-1}\sum_{s=0}^{1} (x_i == s)(x_j == 1-s) B_{s,1-s,i,j} $$$

where the unary parameters $$$A_{t,i}$$$ are arbitrary real numbers, but all pairwise coefficients satisfy

$$$ B_{s,1-s,i,j} \ge 0. $$$

Proof

Construct a graph with vertices

$$$ {s, t} \cup {0,1,\dots,n-1}, $$$

where $$$s$$$ is the source and $$$t$$$ is the sink.

First, normalize the unary terms so that they are non-negative. For each $$$i$$$, define

$$$ c_i = -\min\big(0, A_{0,i}, A_{1,i}\big), $$$

and replace $$$A_{0,i}, A_{1,i}$$$ by $$$A_{0,i} + c_i, A_{1,i} + c_i$$$. This adds the constant $$$\sum_i c_i$$$ to the energy, so the minimizer does not change. After this transformation, we may assume

$$$ A_{t,i} \ge 0 \quad \text{for all } t,i. $$$

Now build the following edges:

For each $$$i$$$, add an edge $$$s \to i$$$ of capacity $$$A_{1,i}$$$.
For each $$$i$$$, add an edge $$$i \to t$$$ of capacity $$$A_{0,i}$$$.
For each unordered pair $$${i,j}$$$, add a directed edge:
if $$$i \lt j$$$, add edge $$$i \to j$$$ with capacity $$$B_{0,1,i,j}$$$;
if $$$i \gt j$$$, add edge $$$i \to j$$$ with capacity $$$B_{1,0,j,i}$$$.

One can check that for any cut $$$(S,T)$$$ with $$$s \in S$$$ and $$$t \in T$$$, the indicator

$$$ x_i = (i \in S) $$$

gives an assignment whose energy equals the cut capacity plus the constant $$$\sum_i c_i$$$. Conversely, any assignment corresponds to some cut of this form.

Thus, by max-flow / min-cut, the minimum cut value yields the minimum possible $$$E(x)$$$ (up to the harmless constant shift). The minimizing assignment is reconstructed from the min-cut: vertices on the source side correspond to $$$x_i = 1$$$, those on the sink side to $$$x_i = 0$$$.

$$$\square$$$

Comparing this with the full Problem 1, we see that:

We do not explicitly have $$$B_{0,0,i,j}$$$ or $$$B_{1,1,i,j}$$$ terms.
The mixed terms $$$B_{0,1,i,j}$$$ and $$$B_{1,0,i,j}$$$ must be non-negative.

This does not mean we truly cannot handle $$$B_{0,0,i,j}$$$ or negative $$$B_{0,1,i,j}$$$. It only means that to use Lemma 2 directly, we must rewrite the energy into an equivalent form. The precise condition for this is the Kolmogorov–Zabih criterion.

Theorem (Kolmogorov–Zabih condition)

An instance of Problem 1 can be reduced to the form of Lemma 2 (using only constant shifts and moving terms between unary and pairwise parts) if and only if for every pair $$$(i,j)$$$,

$$$ B_{0,0,i,j} + B_{1,1,i,j} \le B_{0,1,i,j} + B_{1,0,i,j}. $$$

Proof (sketch)

Intuitively, we are allowed to “trade” some part of a pairwise term for unaries without changing the energy function. For example, for any constant $$$c$$$ (of any sign), we may:

subtract $$$c$$$ from both $$$B_{0,0,i,j}$$$ and $$$B_{0,1,i,j}$$$, and add $$$c$$$ to $$$A_{0,i}$$$; or
subtract $$$c$$$ from both $$$B_{1,1,i,j}$$$ and $$$B_{0,1,i,j}$$$, and add $$$c$$$ to $$$A_{1,j}$$$; etc.

These transformations preserve $$$E(x)$$$ for all $$$x$$$. By combining such moves, we can shift arbitrary amounts of mass between one term on the left-hand side and one term on the right-hand side of the inequality.

Therefore, it is enough to understand the “core” case where

$$$ B_{0,0,i,j} = B_{1,1,i,j} = 0. $$$

In that case, the condition simplifies to

$$$ 0 \le B_{0,1,i,j} + B_{1,0,i,j}, $$$

and the structure of Lemma 2 allows only non-negative weights on the mixed terms. That is exactly what the inequality enforces once we have exhausted all possible shifts from pairwise to unary terms. (Note: we can also change between $$$B_{0,1,i,j}$$$ and $$$B_{1,0,i,j}$$$ using the above tranformations.

A fully formal proof involves systematically tracking these transformations and verifying that the inequality is both necessary and sufficient. Conceptually, the condition is a version of submodularity of the pairwise potential.

$$$\square$$$

We also have one further trick, which is what is implicitly meant by “directly” in the theorem.

Fact (variable flipping)

For each variable $$$x_i$$$, we may decide to swap the roles of $$$0$$$ and $$$1$$$, i.e., replace $$$x_i$$$ by $$$1 - x_i$$$.

In practice, some problems naturally split the vertex set into two types, and flipping some of the variables can make the submodularity condition obviously true.

Understanding the model

The Kolmogorov–Zabih condition is compact, but it is not always obvious whether a given problem satisfies it. It is therefore useful to have a constructive view: what kinds of penalties and rewards can we safely represent using a single max-flow?

We can view the energy as being built by starting from

$$$ E(x) = 0 $$$

and repeatedly adding terms that preserve the Kolmogorov–Zabih condition. For example, we can add:

A cost or reward $$$c \in \mathbb{R}$$$ to either $$$x_i$$$ or $$$\neg x_i$$$.

In terms of indicators, “add $$$c$$$ if $$$x_i = 1$$$” or “add $$$c$$$ if $x_i = 0$”.

A positive cost $$$c \gt 0$$$ whenever $$$\neg(x_i \implies x_j)$$$ holds, i.e. when $$$x_i \land \neg x_j$$$.

This corresponds to a directed edge $$$j \to i$$$ of capacity $$$c$$$ in the construction of Lemma 2.

A positive cost $$$c \gt 0$$$ for $$$\neg x_i \land x_j$$$.

This is just the previous case with $$$i$$$ and $$$j$$$ swapped.

A negative cost (i.e., a reward) $$$c \lt 0$$$ when $$$x_i \land x_j$$$.

Note that

$$$ -(x_i \land x_j) = (\neg x_j) + (\neg x_i \land x_j) - 1, $$$

if we interpret each logical expression as its indicator (0/1). Thus, a term involving $$$x_i \land x_j$$$ can be rewritten using only unary terms and a single mixed term with the correct sign, plus a constant shift in $$$E(x)$$$. This reduces to one of the allowed constructions above.

A negative cost $$$c \lt 0$$$ when $$$\neg x_i \land \neg x_j$$$.

This is analogous to the previous case; apply the same kind of rewrite.

A positive cost $$$c \gt 0$$$ when $$$x_i \ne x_j$$$.

Using indicator algebra,

$$$ (x_i \ne x_j) = (x_i \land \neg x_j) + (\neg x_i \land x_j), $$$

so this is just a sum of two allowed mixed terms with positive weights.

A (positive or negative) $$$c$$$ that gives a cost of $$$c$$$ when $$$x_i=x_j=1$$$ and $$$-c$$$ when $$$x_i=x_j=0$$$.

This is a trick because we don't need the quadratic term. Since,

$$$ (x_i \land x_j) - (\neg x_i \land \neg x_j) = x_i - \neg x_j $$$

In the above, I expressed things using $$$\land$$$ instead of $$$\lor$$$ because conjunctions are usually more intuitive to reason about in this setting. You can freely translate this to OR-based conditions if you prefer.

Summarizing:

We can freely add arbitrary unary terms (costs or benefits) to conditions like $$$x_i$$$ or $$$\neg x_i$$$.
For binary terms:
If both literals have the same sign (e.g. $$$x_i \land x_j$$$ or $$$\neg x_i \land \neg x_j$$$), then the coefficient must be a benefit (negative cost).
If the literals have opposite signs (e.g. $$$x_i \land \neg x_j$$$ or $$$\neg x_i \land x_j$$$), then the coefficient must be a cost (positive).

Any other Boolean relation on a pair of variables can be decomposed into a combination of these basic forms and constants, or you can test it directly using the Kolmogorov–Zabih inequality plus optional variable flips (e.g. the last example). Practically, what we have given are just various elements of the allowed half plane, inside the 4-dimensional spaces of $$$B_{0,0,i,j}, B_{1,0,i,j} , B_{0,1,i,j}, B_{1,1,i,j}$$$.

Remember: the sign restrictions are applied after you have chosen which variables (if any) are flipped, i.e. after you decide, for each $$$i$$$, whether $$$0$$$ and $$$1$$$ are swapped.

Implementations

Coming soon, after I have solved at least one more problem on this topic.

References

Cost-Benefit Flow https://mirror.codeforces.com/blog/entry/148438 by JJCUBER

Solving Problems with Min Cut Max Flow Duality https://mirror.codeforces.com/blog/entry/136761 by errorgorn

V. Kolmogorov and R. Zabin, "What energy functions can be minimized via graph cuts?," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, Feb. 2004, doi: 10.1109/TPAMI.2004.1262177. keywords: {Minimization methods;Stereo image processing;Image restoration;Layout;Image reconstruction;Computer vision;Application software;Stereo vision;Markov random fields;Dynamic programming},

Комментарии (4)

Написать комментарий?

arvindf232

5 месяцев назад, скрыть # |

Auto comment: topic has been updated by arvindf232 (previous revision, new revision, compare).

→ Ответить

-real-account

🤐

JJCUBER

+24

Thank you for asking around and finding the Kolmogorov–Zabih paper! That is in no small part something that I tried and failed to find myself before I decided to rederive everything and create the blog post.

Admittedly, I most definitely went overboard in trying to over-explain certain things, not to mention I tried to tailor it towards too many audiences (which ended up making the experience worse for everyone). I also have a bit of an odd personal background (I think in a more pure-maths/proofy way), so the way that I wanted to best convey the article would not have been helpful to most people on codeforces. (After all, a very formal proof-centric blog post would have been really dry and even harder to read than how it ended up.)

In the event that I write another blog post, I'll keep all this in mind and hopefully it will turn out more useful.

5 месяцев назад, скрыть # ^ |

← Rev. 3 →

+15

Thank you for that. As with the comment on your blog, I also don't have a good view of "how verbose/how concise" something should be, but I think that generally (1) predict the level of readers that would be most interested in it, (2) make it good for yourself, are both very reasonable choices to me. For this particular topic I predict >= LGM.

ALso, personanly, readers can skip the proofs easily (as long as the statement is clear / you explain any essential thinking additionally), so adding proofs in general isn't that boring -- just ensure people who skip and don't both have good experiences.

Блог пользователя arvindf232