Find Index with minimum cost
There are n indices from 1 to n.
Some indices are marked and given as input in an array (size m).
For the unmarked indices, find the index having the minimum cost.
Cost of an index is defined as -
the sum of reciprocal of absolute difference between this index and all other marked index.
$$$COST(i) = \sum_{m_{idx} \in \text{marked indices}} \frac{1}{|i - m_{idx}|}$$$
Constraint: 1 <= n, m <= 1e5








Try to find the "farthest" index from the marked indices. Can you see why this works?
Mind the gaps between marked indices.
The reciprocals get smaller as the distances (absolute values) get larger, so we need to basically find an index "farthest" from all marked indices.
To do this, sort the marked indices and find the largest gap. Say the gap's left and right endpoints are $$$l$$$ and $$$r$$$, and therefore the distance $$$d$$$ becomes $$$r-l$$$.
There are two cases.
Case 1 — $$$d$$$ is even: You can pick the midpoint of $$$l$$$ and $$$r$$$ without any rounding, i.e. pick $$$\frac{l+r}{2}$$$.
Case 2 — $$$d$$$ is odd: For this, the index will be biased towards the side with less indices, since if we're farther from the side with more indices, our cost will be less. If both sides have an equal count, however, we can choose either index.
This means that if the left has less indices the answer is $$$\frac{l+r-1}{2}$$$, if the right side has less the answer is $$$\frac{l+r+1}{2}$$$, if both sides are equal the answer is either of the two indices.
Consider an example where marked indices are —
[2, 100, 101, 102, 103, 105]
According to your approach, largest gap is between (2 and 100). So, d = (100 — 2) = 98
half = 49
Hence, we should be selecting l + (r — l)/2 as the index,
which is 2 + (100 — 2) / 2 = 51 (in this case).
Ckearly, 51 is equidistant from 2 and 100.
But is it optimal?
Think about it.
Don't you think selected index should be leaning towards 2 (given the distribution of marked indices in this example).
I wanted to share a neat “one-pass” trick for efficiently evaluating
for all $$$i=1,\dots,n$$$ in only $$$O(n+m)$$$ time, rather than the naive $$$O(n \cdot m)$$$. Here’s the idea:
Preprocessing:
Build an array $$$\text{freq}[1 \dots n]$$$ where $$$\text{freq}[x]$$$ counts the number of marks at position $$$x$$$.
Counting Distances:
For each $$$i$$$, let:
Then, the cost at $$$i$$$ is:
At the leftmost position $$$i=1$$$:
Therefore,
This can be computed in $$$O(n+m)$$$ time if you loop only over the marked positions.
The trick is to update the cost in constant time when moving from $$$i$$$ to $$$i+1$$$. As you increment $$$i$$$:
Define two sums:
— $$$\text{sumLeft} = \sum_{d} \text{countLeft}[d] \cdot \frac{1}{d}$$$,
— $$$\text{sumRight} = \sum_{d} \text{countRight}[d] \cdot \frac{1}{d}$$$.
So, $$$\text{cost}(i) = \text{sumLeft} + \text{sumRight}$$$.
The updates are:
Notice that:
so you only need to adjust each part by a simple term. By using clever shifting (or two pointers), the counts can be updated without re-scanning the entire array.
Finding the Minimum:
As you sweep $$$i$$$ from $$$1$$$ to $$$n$$$, skip positions where $$$\text{freq}[i] \gt 0$$$ (since these are marked) and track the unmarked $$$i$$$ with the minimum $$$\text{cost}(i)$$$.
Time Complexity:
Each increment is $$$O(1)$$$ (apart from minor array shifts via an offset), so the overall time is $$$O(n + m)$$$, which is optimal for $$$n, m \le 10^5$$$.
Key takeaway:
The main trick is recognizing that the difference
can be computed in constant time if you know how many $$$x$$$ are exactly $$$d$$$ units away from $$$i$$$. This avoids recomputing a summation over $$$m$$$ elements for each $$$i$$$.
Hope you find this method as elegant and useful as I did!
How do you update the reciprocals? You have to increase every denominator in
sumLeft, decrease every denominator insumRight, and (more manageably) sometimes swap reciprocals fromsumLefttosumRight.For the left side (sumLeft):
Every marked point to the left of position i is now one unit further away when you move from i to i + 1. If a marked point was at distance d (with reciprocal 1/*d*), its new distance becomes d + 1 (with reciprocal 1/(*d + 1*)). The change in its contribution is:
Instead of recomputing 1/(*d + 1*) for every point, you can subtract $$$ \frac{1}{d(d+1)} $$$ for each of the
countLeft[d]points at distance d.For the right side (sumRight):
Similarly, each marked point to the right gets one unit closer. A point at distance d will now be at distance d − 1, changing its reciprocal from 1/*d* to 1/(*d − 1*) (for d > 1). The difference is:
So, for each
countRight[d]marked points at distance d, you add $$$ \frac{1}{d(d-1)} $$$ tosumRight.Swapping at the boundary:
The “swap” happens for points exactly one unit away. When moving from i to i + 1, any marked point that was exactly one unit to the right (with distance 1) becomes exactly at position i (thus no longer contributing to
sumRightbut now tosumLeftas distance 0, which is effectively not counted). You handle this by adjusting your counts: decrementcountRight[1]and incrementcountLeft[1].Efficient update:
By maintaining the counts of marked points at each distance (using your
freqarray), you can updatesumLeftandsumRightin constant time for each shift. Instead of looping over all marked points, you only update based on the counts for the affected distances. This gives you the overall O(n + m) performance.So, rather than “increasing every denominator” or “decreasing every denominator” explicitly, you capture the net change for each distance in one go, using the formulas above. This “one-pass” trick leverages the fact that the differences between consecutive reciprocals are simple fractions that depend only on the distance, not on the specific marked point.
Is your 'd' always fixed?
Didn't found you iterating over d.
No, d isn't fixed—it represents the variable distance between the current index and each marked index. In our approach, we don't iterate over every possible d each time. Instead, we maintain aggregated counts (or sums) that let us update the overall contribution in constant time when we move from index (i) to (i+1). This way, the change in every reciprocal is handled implicitly via a difference update, rather than by iterating over each d.
Kindly provide code for the same. Would be of great help.
It should be a bannable offense to use ChatGPT to write a solution with such confidence while being absolute nonsense.
Anyways, your desired cost for each position can be calculated with FFT. You need two separate convolutions, one for handling marked index to the left and one for the right. Hopefully there is no precision issues with Doubles.
Sorry sir, I think there's been a misunderstanding. I did not use any LLM to write the solution.
arvindf232, can you provide your approach in a bit more detailed manner.
Also, I wonder if there existed an easier approach to solve this.
I made an inaccurate statement: one convolution is enough. Let S be the set of marked indices. Define array A to be $$$A[i]=1$$$ if $$$i$$$ is marked, $$$A[i]=0$$$ if not, A is indexed from $$$0$$$ through $$$n-1$$$. Define array B to be $$$B[i] = 0$$$ if $$$i=0$$$, or $$$\frac{1}{|i|}$$$. Indexed from $$$i=-n$$$ through $$$n$$$. ($$$B[0]$$$ can be anything, its value do not affect any answers)
Let $$$C$$$ be their convolution, which is defined to be $$$C[j] = \sum_{i}A[i]B[j-i]$$$, where the sum is taken over whenever both $$$A$$$ and $$$B$$$ are defined. This $$$C$$$ can be calculated with FFT in $$$O(n \log n)$$$.
Observe that $$$C[i]$$$ now stores the cost of any unmarked index $$$i$$$. ($$$C[i]$$$ for marked $$$i$$$ do not hold any meaning and can be ignored.)
Can you suggest some approach without FFT
Context: This problem was asked in final Interview of an MNC.
Please check my code. I've added comments to help you understand it better. Let me know if you find any mistakes