Count number of distinct elements in submatrix? - Codeforces

→ Pay attention

Before contest
Codeforces Round 995 (Div. 3)
05:43:08
Register now »

→ Streams

Codeforces Round 995 Solution Discussion

By aryanc403

Before stream 08:03:06

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

f2lk6wf90d's blog

Count number of distinct elements in submatrix?

By f2lk6wf90d, history, 7 years ago, In English

In English

Hello everyone, I thought of this problem yesterday: Given a matrix of length N and width M, answer Q queries of this form:
- x₁ y₁ x₂ y₂: Count the number of distinct elements in the submatrix with corners (x₁, y₁) and (x₂, y₂).
There is a version of this problem on Codechef, however the bounds for A_i, j are very small. Is there a way to solve this for e.g.

Unable to parse markup [type=CF_TEX]

? I don't have any particular time complexity in mind.

UPD: It turns out that this problem is named "colored orthogonal range counting". It has been solved in "Further Results on Generalized Intersection Searching Problems: Counting, Reporting, and Dynamization" and "Efficient Colored Orthogonal Range Counting".

+4

f2lk6wf90d
7 years ago
14

Comments

Comments (14)

Write comment?

»

7 years ago, # |

Vote: I like it

-22

Vote: I do not like it

Thanks for sharing this interesting problem.

A modified version of the Summed-area table may be used to find the answer to each query rapidly.

The value of each element (x,y) in the modified version should be the set of distinct elements in the sub-matrix with corners (1,1) and (x,y). The addition and subtraction operations in the expression for the sum of sub-regions in the table should be transformed to computing the number of distinct elements between the corners given in the query using the set union and set difference operations, respectively. Bitwise operators may be used to perform these operators if bitset is used to represent the set of distinct elements.

Best wishes

→ Reply

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

+18

Vote: I do not like it

Your solution is wrong on several levels. First, even if we could use bitset to do what you suggested, it'd still be very slow. Second, the addition operation is doable, indeed, using bitsets, but the subtraction one is not. We could use a 2d-rmq with that bitset of yours so that we don't have subtraction anymore, but the precomputation complexity increases and the query complexity is still too heavy

→ Reply

»

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

-26

Vote: I do not like it

Thanks for your comments.

RE: using bitset is still very slow.

I have extensively used c++ bitset STL to implement set operations before, and the approach proved to be much faster than the corresponding c++ set STL operations. The main reason for the improved performance is that no set insertion operations are needed when performing set operations such as union and intersection. Alternatively, these operations are performed in the bitset implementation using bitwise operators.

RE: The subtraction operation is not doable using bitsets

This is incorrect. This subtraction operation A — B in the original Summed-area table corresponds to the set difference between A and B. Using Boolean algebra, it is simple to show that the result can be computed using bitwise AND operation between the bitset representation of A and the bitset representation of the complement of B, which can be computed using a one's complement bitwise operation as well.

The complexity of the pre-computing the bitset matrix increases with the matrix size and the bitset size. However, the complexity of computing the answer of each query is independent of the matrix size. It only depends on the bitset size.

Best wishes

→ Reply

»

»

»

»

7 years ago, # ^ |

← Rev. 3 →

Vote: I like it

+23

Vote: I do not like it

I don't know much about Boolean algebra. What I do know is that when trying to do the equivalent of A / B there is no way to achieve that using bitsets because A and B are actually multisets (not just sets) and when storing them as bitsets you lose information. If a certain element x is part of A and is also part of B, it could be either part of A / B or not (depending on whether it appeared strictly more times in A than in B). Take, for example, the vector matrices (1, 1) and (1, 2) and query on (1, 2), (1, 2) (that is the matrix formed from the last element). A = {1} in both cases and B is once {1} and once {1, 2}. A/B should contain {1} in the former and should not contain {1} in the latter. I wrote this message, even considering this approach is not going to lead us to a fast solution, just so that you and others that might think your solution works understand why I claim it doesn't (so with an informative purpose). If you're still not convinced I'm right just try to implement it and stress-test it

→ Reply

»

»

»

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

-23

Vote: I do not like it

The aforementioned problem considers only distinct elements in the sub-matrix. Therefore, the multiset case is beyond the scope of this problem and the proposed bitset implementation. No information is lost when storing distinct elements of a set in its corresponding bitset. A bitset simply stores the set membership function of any given set, and returns whether an element (k) belongs to a set X as the Boolean value x[k].

For example, suppose a 5-element universal set U = {0,1,2,3,4}. The corresponding bitset representation of any subset of U is an instance of type bitset<5>. The bitset<5> x representation of the X = {1,2} corresponds to the bit-string "00110", where the right-most bit is the least-significant bit. In this example, x[ 0 ] = x[ 3 ] = x[ 4 ] = 0, and x[ 1 ] = x[ 2 ] = 1. Suppose also that Y = {1,4} whose bitset representation is "10010". Then, X — Y = {1,2} — {1,4} = {2}. Using bitwise operators, NOT Y = "01101", and X AND NOT Y = "00110" AND "01101" = "00100", which is equivalent to {2}.

More information about the bitwise operators of the C++ bitset STL are available here. bitset operators

Best wishes

→ Reply

»

»

»

»

»

»

7 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

Try coding the solution for the Codechef problem using your approach.

→ Reply

»

»

»

»

»

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

-15

Vote: I do not like it

Thanks for the suggestion.

I've added it to my to-do list.

Best wishes

→ Reply

»

»

»

»

»

7 years ago, # ^ |

Vote: I like it

-23

Vote: I do not like it

Thanks for clarifying why extending the Summed-area table approach does not work for this problem.

It remains true that bitset can be used to represent the set membership function efficiently, and that the set difference operation is doable using bitset operators.

Best wishes.

→ Reply

»

»

»

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

-26

Vote: I do not like it

Interestingly enough, the CodeChef tutorial of the problem does use an extended version of the Summed-table approach to compute the number of distinct elements in a sub-matrix to answer each query.

I proposed the same approach without reading the tutorial.

Best wishes

→ Reply

»

»

»

»

»

»

7 years ago, # ^ |

Vote: I like it

+5

Vote: I do not like it

Well, he didn't ask about solution of that problem...

→ Reply

»

»

»

»

»

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

-25

Vote: I do not like it

That's right.

The only extension in the approach presented here, which turned out to be the same as the one presented in the tutorial, is using bitset to represent distinct elements in sub-matrix.

Best wishes.

→ Reply

»

»

»

»

»

»

»

»

7 years ago, # ^ |

← Rev. 2 →

Vote: I like it

+10

Vote: I do not like it

Haven't you still figured that the bitset solution is wrong? Just make the difference between set and multiset and look again at the comment where I tried to explain you why it doesn't work. It's about frequencies of elements in a multiset (that you wrongly store as a bitset, which is a set, not multiset)

→ Reply

»

»

»

»

»

»

»

»

»

7 years ago, # ^ |

Vote: I like it

0

Vote: I do not like it

The last comments are about the Summed-table approach used in the problem tutorial, not about whether it is correct or wrong to use bitsets for counting distinct elements in the sub-matrix.

Thank you.

→ Reply

»

7 years ago, # |

Vote: I like it

0

Vote: I do not like it

Auto comment: topic has been updated by f2lk6wf90d (previous revision, new revision, compare).

→ Reply