Are there any good approximation algorithms for this problem?

#	User	Rating
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3611
4	jiangly	3583
5	strapple	3515
6	tourist	3470
7	dXqwq	3436
8	Radewoosh	3415
9	Otomachi_Una	3413
10	Um_nik	3376

#	User	Contrib.
1	Qingyu	163
2	adamant	150
3	Um_nik	146
4	Dominater069	144
5	errorgorn	141
6	cry	139
7	Proof_by_QED	136
8	YuukiS	135
9	chromate00	134
9	TheScrasse	134

You’re given a set of jobs and integers $$$x$$$ and $$$y$$$. Each job is represented by a time interval $$$[l, r]$$$ (integral endpoints, $$$0 \leq l \lt r \leq y$$$). The goal is to find an assignment of jobs to machines (assume that we have an infinite number of identical machines) which minimizes the number of machines used, while also satisfying the following:

Each job is assigned to some machine.
No two jobs assigned to the same machine intersect (but their endpoints can touch, so $$$[a, b]$$$ and $$$[b, c]$$$ can be assigned to the same machine).
Every used machine has some contiguous “break interval” of length $$$x$$$, which doesn’t intersect with any job assigned to that machine (once again, endpoints can touch). In other words, there is some contiguous time period of length $$$x$$$ where no job is being executed on it.

I’m pretty sure there isn’t an exact solution in P, so I’m looking for decent approximation algorithms.

I could only think of a simple approximation algorithm with a bad approximation factor that's heavily dependent on the input data (although I'm pretty sure I could make a cursed div-2 D out of it).

The idea

Let $$$s$$$ be the maximum length of a job interval.

The algorithm has an approximation factor of about $$$(1 + \frac{x + 2 \cdot s}{y})$$$ (so it’s only helpful when we have $$$\max(x, s) \lll y$$$), and it works in the following manner:

Let’s first find an optimal assignment for $$$x=0$$$ (this is the standard machine assignment problem and can be solved in sub-quadratic time using a simple greedy algorithm).

You then run the following procedure:

m = number of machines required by optimal assignment for x = 0
v[m] = vector of assigned jobs for x = 0 (v[i] stores all intervals assigned to machine i)

i = 1   #index of the old machine we’re currently processing
p = 0   #pointer that we will use as the start of the break on the next machine
g = m   #number of machines used till now

while i <= m:
    if p == 0:
        g = g + 1
        we rent a new machine indexed g, initially assigned no jobs

    if some there exists some job [l, r] in v[i] such that l <= p < r, set p = r

    let J be the set of jobs in v[i] which intersect with [p, p + x]
    let Q be max(p + x, maximum rightbound in J (0 if J is empty))
    
    if y - Q < x:
        we assign a break interval [p, p + x] on machine g 
        p = 0
    else
        on machine i, reserve interval [p, p + x] for a break
        move all intervals in J from machine i to machine g
        p = Q
        i = i + 1

finally, if we havent assigned a break interval on the last machine, its alright as its guaranteed that [y - x, y] will be free on this machine

the required number of machines is g

That would be kind of awkward to explain in plain english (and there might be a few off-by-oneish bugs in the pseudo code, ignore them), so I hope the idea is self-evident. We're creating new machines so that we can freely assign breaks to the old ones in a cyclical manner, and are then able to transfer the displaced jobs to the new ones (while of course ensuring that the new machines also get break intervals).

What is the approximation factor? Notice that $$$p$$$ is only set back to $$$0$$$ after at least $$$\lfloor \frac{y}{x + 2 \cdot s} \rfloor - 1$$$ increments to $$$i$$$. In other words, we create a new machine for every $$$\lfloor \frac{y}{x + 2 \cdot s} \rfloor - 1$$$ old ones. So we use at most $$$\approx m \cdot (1 + \frac{1}{\lfloor \frac{y}{x + 2 \cdot s} \rfloor - 1}) \approx m \cdot (1 + \frac{x + 2 \cdot s}{y})$$$ machines (forgive my egregious use of $$$\approx$$$).

Since $$$m$$$ (required number of machines with $$$x = 0$$$) cannot be larger than the number of machines in the optimal assignment with whatever value of $$$x$$$ we're given, the approximation factor is at least as good as $$$(1 + \frac{x + 2 \cdot s}{y})$$$.

We can improve the constant factor(s) in the $$$(O(x) + O(s))/O(y)$$$ term through some cheap tryhard methods, but the general $$$(x + s)/y$$$ nature will still remain.

There are also some pretty easy exact solutions which are exponential in the number of machines (which, unlike those exponential in the number of jobs, can occasionally be practical to run), but they're not interesting.

Are there any assumptions on the input data that would make this problem solvable in an interesting manner?

I would also be grateful if someone could direct me towards existing work on related problems if they know of it.

bool check(ll machines, ll x, ll y, vector<pair<ll, ll>> &sortedRanges) { multiset<pair<ll, ll>> idle; // <gotBreak, lastEndedJob> multiset<pair<ll, ll>> working; // <jobEndingPos, gotBreak> for (int i = 0; i < machines; i++) idle.insert({0, 0}); ll job = 0; for (int pos = 0; pos <= y; pos++) { while (!working.empty() && working.begin()->first == pos) { // job which ends at current pos auto it = working.begin(); idle.insert({it->second, pos}); working.erase(it); } while (!idle.empty() && idle.begin()->first == 0 && idle.begin()->second + x <= pos) { // those who are idle for more than x time, update that they got break auto it = idle.begin(); ll endPos = it->second; idle.erase(it); idle.insert({1, endPos}); } while (job < sortedRanges.size() && sortedRanges[job].first == pos) { // starting jobs which starts at current pos if (idle.empty()) return false; auto it = --idle.end();// taking last is optimal ig working.insert({sortedRanges[job].second, it->first}); idle.erase(it); job++; } } if (idle.begin()->first == 0) return false; return true; }

Comments (22)

Write comment?

Nyemot

6 months ago, hide # |

+11

I don't understand the third bullet point, can you explain it?

→ Reply

DNR

6 months ago, hide # ^ |

Consider the set of jobs/intervals assigned to some machine. Then for every machine: after we assign the given intervals, it should be possible to assign an additional interval (not present in the original set) of length $$$x$$$ somewhere within the timeline $$$[0, y]$$$, such that it doesn't intersect with any of the old intervals on this machine (boundaries can touch).

is it correct to say that for 2 jobs/intervals to be assigned into the same machine(which was empty) they have to:

not intersects
the different between the end of the first job and the start of the second job is >= x

this is what I understand from 3rd point.

← Rev. 3 →

No that's not what it means.

Consider the following setup:

$$$y = 20, s = 2, x = 4$$$
on some machine, we assign the segments $$$[1, 3], [7, 9], [10, 12], [17, 18], [18, 20]$$$

The assignment on this machine is valid because we can place a contiguous length-4 interval in some gap after this assignment (in this case, we can place one at $$$[3, 7], [12, 16], [13, 17]$$$). We do not need to be able to place a break between every pair of adjacent intervals, just once somewhere.

So in this case, it doesn't matter that we cannot place a break between $$$[7, 9]$$$ and $$$[10, 12]$$$.

I see, thanks.
I thought the problem was easy, turns out I just read 3rd point wrong.

Unfortunately I most likely won't solve your problem. Though, I will learn about the standard machine assignment problem soon.

MannyHeffley

How did you prove that there isn't an exact solution in P?

Jrke

← Rev. 2 →

If i understood your question correctly, then i have an idea. So, let us try to do binary function on the minimum no of machines. Cause if job can be done in m machines then for sure can be done with m+1, m+2 ... machines.

Now, how my check function will look? i will do something like sweep line with sorting the ranges with there l parameter also i will maintain two sets, lets name them, working and idle.

working set will store end pos, machine id of the machine which are currently working.

idle set will store three info -> GotBreak, last ended job pos, machine id.

ok so GotBreak is a (0,1) which tells whether the machine got that x length break or not, last job ended position is just the endpoint of job which the machine did and lastly the machine id.

by updating GotBreak of every idle machine(i.e. when we reach a point where its last job done is atleast x position behind) and these sets. Now the last part left is how to assign the job to a machine. i say that the last element of idle set is best why?

so first if you have a machine which already have break nothing issue. now if no machine have break so assigning the job to machine which has finished its job latest is optimal ig?

and if set is empty then for sure return false, and at end if everyone got break then return true. I am not able to find whats incorrect in this approach. compression and all i can get this in nlog^2n ig maybe.

I am not able to find whats incorrect in this approach.

I, on the other hand, am not able to find what’s correct in this approach.

umm, can you explain what?

-8

can you explain your idea through code or at least pseudo-code? in its current (extremely ambiguous) form, it just feels incorrect at worst and doubtful at best

i will do binary search on machines. code might be buggy but idea is more or less same.

This is obviously incorrect. Consider $$$x = 1, y = 3$$$, with intervals $$$[0, 1], [1, 2], [1, 3]$$$.

We can use 2 machines in the following way:

Assign intervals $$$[0, 1]$$$ and $$$[1, 2]$$$ to the first machine. It has a break on $$$[2, 3]$$$.
Assign intervals $$$[1, 3]$$$ to the second machine. It has a break on $$$[0, 1]$$$.

Your algorithm does not find this configuration, and requires 3 machines instead.

ohh right right, got my mistake thankss. will think over it

can you provide few test cases so that i can test if i get some logic?

SYury

5 months ago, hide # |

+20

I believe there is a trivial 2-approximate algorithm (btw, I'm not sure that this problem is NP-hard, do you have any proof?).

Solve the problem with $$$x = 0$$$ exactly, then consider each machine in the optimal solution. I claim that each task $$$[l, r]$$$ is such that either $$$r \le y - x$$$ or $$$l \ge x$$$, otherwise $$$[l, r]$$$ can't even be placed alone on one machine, and the answer is NO. Now consider the optimal answer for $$$x = 0$$$ and split each machine into two: the first will have tasks such that $$$r \le y - x$$$ (break interval $$$[y - x, y]$$$), and the second will have tasks such that $$$l \ge x$$$ (break interval $$$[0, x]$$$). This way we have $$$ans \le 2opt_{x = 0} \le 2opt$$$.

5 months ago, hide # ^ |

-54

obvious and worse than the one in the blog

Roberio

+60

I don't understand what's the point of creating a blog to ask people to solve an open problem for you and respond comments with "obvious", "extremely ambiguous".

Dude... CP community is sometimes the worst.

4 months ago, hide # ^ |

-18

I didn’t ask anyone to solve an open problem, I asked for help with approximation algorithms.
If the commenter I replied to had actually read the algorithm I described, he wouldn’t have wasted time describing the 2-approximation.
If you had actually read the comment to which I charitably replied with “extremely ambiguous”, you wouldn’t have wasted time writing your comment.

expertaq

It is NP-hard because Job Scheduling Problem is NP-hard and infact this problem is a more general form of Job Scheduling problem, so if one can solve this is polynomial time then the former won't be a NP-hard.

The problem can be solved using integer linear programming and applying some traditional ILP solvers.

It is NP-hard because Job Scheduling Problem is NP-hard

this problem is a more general form of Job Scheduling problem

Not really, job scheduling usually assumes that job start times are not fixed

Xellos

Isn't it not guaranteed that a solution even exists? If some job's time interval doesn't allow a break interval even if it's the only one on its machine, for example.

yes

DNR's blog