Блог пользователя Saksham_Sahgal

Автор Saksham_Sahgal, история, 9 месяцев назад, По-английски

Bloom Filters


Ever wondered how

  • a Website with a massive user base could determine whether a username is taken or not in milliseconds?
  • How does Chrome checks if the url you are visiting isn't malicious without comparing it against a bulky list of URLs?
  • How Quora filters out posts you have already seen ?

So the problem here is ultimately you have a set of billions of strings, and a server needs to check whether a string passed to it by the client exists in the set or not.

Straightforward but Not so Good Ways : Considering we have N strings and the average length of each string is L.

Linear search

  • O(N*L) Search Time Complexity, Iterate over all the strings and compare and see if the string is present.
  • O(N*L) space complexity.

Binary search

  • backed by a Red-Black Tree / balanced BST
  • O(L × log n) Search Time Complexity, Each comparison in the tree takes O(L), and the tree height is O(log n)
  • O(N*L) space complexity.

trie

  • O(S) Search Time Complexity, where $$$S$$$ is the length of the search string
  • O(n×L), space complexity. In the worst case (all strings are unique with no shared prefixes)

The problem Here is when we are operating at an enterprise scale; even an $$$O(S)$$$ operation/request is costly to the server doing the comparison because it has to do billions of such requests/second.


Here is where a Bloom Filter can be helpful:

Concept:

  • Bloom Filter is a probabilistic data structure that uses bits to determine whether an item might be in a set or definitely is not, using far less space than a full list or hash set.
  • Rather than storing the data items directly, Bloom filters represent them using a bit array. Since multiple items can map to the same bits through hashing, hash collisions are not only possible, they're expected.

Working:

  • The client has a set of hash functions $$$h_1 + h_2 + h_3 ... h_k$$$
  • We have a bit array of m bits, all set to zero initially. $$$[0,0,0,0,0,0,0]$$$ where $$$m « S$$$
  • Each hash function generates an index $$$∀i∈[0,m-1]$$$ in O(1) time.
  • The hash function is deterministic, so it generates the same index for the same string.
  • So instead of passing the entire string to the server, the client instead passes the $$$k$$$ indexes generated from the $$$k$$$ hash functions calculated on the client side itself in $$$O(k)$$$ time.
  • This way we are also reducing the network latency by reducing the payload size.

Search Operation:

  • The hash operation is offloaded to the client side, so there is even less load on the server.
  • The set bits indexes passed to the server will act as the filter.
  • It iterates over all the indices got from the client, and if any one of the bits is unset, then the string is definitely not present in the list.
  • This searching takes $$$O(k)$$$.

What if all the bits are present?

  • Well, in that case, the string might/might not be present in the list, because the same set of bit indexes could be generated by any/⁣Cumulatively many string as well.
  • But this way we are eliminating a large set of strings that are definitely not present.
  • In this case, we actually have to do a search, probably with a trie.

How can we reduce false positive probability ?

  • By increasing the number of Hash Functions $$$(↑k)$$$
  • By increasing the size of bit array $$$(↑m)$$$
  • By using a hash function that generates uniformly distributed/un-skewed indexes.

Insert Operation

  • The set bits indexes passed to the server will all be set.
  • The insert operation takes $$$O(k)$$$ time.
  • Thought it's expected that some of the hash bit indexes passed are already set by some previous string.

Delete Opetation

  • The drawback of the Bloom filter is we cannot delete elements from the list by just clearing the hash bits.

Why unsetting bits Won't work ?

  • If we clear the bits set by a string, we might end up clearing a subset of bits of some other string.
  • Eg — Let's say a string "Hi i'm Saksham" generated hash bit indexes as [1,2,3] and another string "Hello World!" generated hash bit indexes as [3,7,12].
  • Now if we want to delete the string "Hello World!" by clearing the bits [3,7,12], we will end up clearing the bit 3 (overlap).
  • Now if we ever search for the string "Hi i'm Saksham" we won't find it because of the unset but 3.
  • So this way we lose the property of definitely not present.

Полный текст и комментарии »

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится

Автор Saksham_Sahgal, 11 месяцев назад, По-английски

Stars and Bars – An Intuitive Explanation

Clasic Problem

The classic Stars and Bars problem asks:
Given n identical items (stars) and r boxes, in how many ways can you distribute the items among the boxes?
There are no restrictions — any box can have zero or more items.

Think of it as N stars (5 in this example). Now, if you use 2 sticks to act as barriers between stars, it will divide the stars into 3 regions, depending on the placement of the barriers. We can assume the stars in each region are stars in the boxes. so if we have to distribute into 3 boxes we need 2 sticks to divide into 3 regions; consequently, to divide into r regions we would need r-1 sticks.

Now to find the number of ways we can just permutate all of these by (n+r-1)! and then remove the repetitive permutations by dividing by = $$$(n!)(r-1)!$$$.

So the Number of ways becomes = $$$\frac{(n+r-1)!}{n!(r-1)!}$$$.

Now, we can also represent this with the equation = $$$x_1 + x_2 + x_3 ... x_r = n$$$ where $$$x_i$$$ represents the number of stars $$$ith$$$ box got and in this case $$$ 0 \lt = x_i$$$, and the solution for this is $$$\frac{(n+r-1)!}{(n!)(r-1)!}$$$.

stars and bars with lower bounds :

Keeping the above formula in mind, now what if, there is a constraint saying we cannot have any box with less than $$$l$$$ $$$(l \lt = n)$$$ stars. which means now constraint on $$$x_i$$$ is $$$ l \lt = x_i $$$, so to solve this now we can use substitution, we can say $$$x_i = y_i + l$$$ where $$$ 0 \lt = y_i$$$.

so if we substitute this in the equation $$$x_1 + x_2 + x_3 .. x_r = n$$$ we get $$$y_1 + y_2 + y_3 .. y_r = n - r*l$$$

so the solution is $$$\frac{(n - r(l-1)-1)!}{(n - r*l)!(r-1)!}$$$.

stars and bars with upper bounds :

Now what if there is a constraint saying we cannot have any box with more than $$$l$$$ $$$(l \lt = n)$$$ stars. which means now the constraint on $$$x_i$$$ is $$$ x_i \lt = l $$$, so to solve this now becomes little tricky. for simplicity we can assume we have r = 3. so $$$x_1 + x_2 + x_3 = n$$$

We can calculate the solution for this with $$$\frac{(n+r-1)!}{n!(r-1)!}$$$ and substract =

  • (case when $$$x_1 \gt l$$$)
  • (case when $$$x_2 \gt l$$$)
  • (case when $$$x_3 \gt l$$$)

and add to compensate for double substraction :

  • (case when $$$x_1$$$ and $$$x_2$$$ > l)
  • (case when $$$x_1$$$ and $$$x_3$$$ > l)
  • (case when $$$x_2$$$ and $$$x_3$$$ > l)

Subtract to compensate for addition :

  • (case when $$$x_1$$$ and $$$x_2$$$ and $$$x_3$$$ > l)

How to calculate Cases =

eg - when $$$x_1 \gt l$$$ : $$$y_1 = x_1 - (l+1)$$$ since $$$x_1 \gt l$$$ (ie) $$$x_1 \gt = l+1$$$ so $$$y_1 \gt = 0$$$

$$$x_1 + x_2 + x_3 = n$$$

$$$y_1 + x_2 + x_3 = n - (l + 1)$$$

So $$$n$$$ will become $$$n - (l+1)$$$

So the Number of ways becomes = $$$\frac{(n - (l + 1)+r-1)!}{(n - (l + 1))!(r-1)!}$$$. = $$$\frac{(n - l + 1)!}{(n - (l + 1))!(2)!}$$$

similarly you can calculate for case 2 and 3.

for case 4 5 and 6 :

$$$n$$$ will become $$$n - 2*(l+1)$$$ so: $$$\frac{(n - 2*l)!}{(n - 2l -2))!(2)!}$$$

For case 7 : $$$n$$$ will become $$$n - 3*(l+1)$$$ so: $$$\frac{(n-3l-1)!}{(n-3l-3)!(2)!}$$$.

Hence —

  • Case 1,2,3 substract : $$$\frac{(n - l + 1)!}{(n - (l + 1))!(2)!}$$$
  • Case 4,5,6 add : $$$\frac{(n - 2*l)!}{(n - 2l -2))!(2)!}$$$
  • Case 7substract :$$$\frac{(n-3l-1)!}{(n-3l-3)!(2)!}$$$.

Here is a problem that uses this Approach — Link

Полный текст и комментарии »

  • Проголосовать: нравится
  • +14
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

It has become a trend for all contests , that A , B , Cs will almost always be constructive , and even after solving a lot of these questions , I mostly seem to get stuck or sometimes take a lot more time than usual to solve these questions.. I just feel like , i'm stuck and can't improve any further, can anyone help?

Полный текст и комментарии »

  • Проголосовать: нравится
  • +2
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

I recently started learning python , so I was looking for some C++ STL equivalents in Python. is there any way i can achieve a similar functionality of —

Set and Maps in python with —

insert (log n)
delete (log n)
find (log n)

as fas a I Understand Python Standard Sets are not sorted and are implemented using Hash-Tables (similar to unordered-Sets in C++) has —

insert (O(1) average case , Worst case O(n) [when large no of values have same hash value])
delete (O(1) average case , Worst case O(n) [when large no of values have same hash value])
find (O(1) average case , Worst case O(n) [when large no of values have same hash value])

I know there is a third party package called sortedcontainers which has some C++ equivalents —

std::set    sortedcontainers.SortedSet
        std::multiset   sortedcontainers.SortedList
        std::map    sortedcontainers.SortedDict

but I want to get these funcionalities without installing any third party packages (because those would be local to my environment), and I want to submit to codeforces.

can anybody help?

Полный текст и комментарии »

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

What is Nim Game and how it is played?

  • It is a game in which two players play optimally turn by turn .
  • In each turn a player can choose only one pile and remove any number of stones (at least one) from that pile.

  • The person who is unable to make a move looses , hence the player who makes the last move wins.

  • determine which player wins the game.


How to decide who will win?

  • To decide who will win the initial configuration of the piles matter.
  • If the xor of all elements of the array is non-zero , then person starting the game will win.

  • If the xor of all elements of the array is zero , then other person will win.

  • Proof


Why it works ?

  • if the xor if all elements is zero then if we make any moves , the xor of all elements will definitely become non zero .
Why?
  • If the xor of all elements is non zero , then we can either make the xor non-zero again , or make the xor zero.
  • There definitely exists atleast one way to make xor zero again.

Why?

How to reach the Conclusion -

Conclusion -

Hence if xor is zero B Wins , else A wins

Полный текст и комментарии »

  • Проголосовать: нравится
  • +13
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

Is there any website/tool that displays heatmaps and other related statistics of a profile of Codechef?

Полный текст и комментарии »

  • Проголосовать: нравится
  • +1
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

Can anyone tell me how the output of Test case 3 of Problem is —

9 9 9 15 15 6 -1 -1 6 -1 -1 -1 -1 6 15

and not —

9 9 9 -1 8 6 -1 -1 6 -1 -1 -1 8 6 8

the as far as i understand — i have to perform N moves , where -

  1. Draw the topmost card from the deck. Let X be the integer written on it.
  2. Then i will find a stack of upfacing cards with the smallest integer greater than or equal to X written on and and i will put this X card into that stack .
  3. if there is not such stack then i will put this card into a new stack.
  4. if any stack has k cards i will eat that stack.

for input —

15 3 3 14 15 9 2 6 5 13 1 7 10 11 8 12 4

According to me the steps should look like —

  • 3 (new stack)
  • 3 14 (two new stacks)
  • 3 14 15 (three new stacks)
  • 3 ( 14 9 ) 15 (9 goes to stack 2)
  • (2 3) ( 14 9 ) 15 (2 goes to stack 1)
  • (2 3) ( 6 14 9 ) 15 (6 goes to stack 2)
  • now stack 2 to is eaten because it has size = 3 (therefore 6 , 14 , 9 is eaten at step 6)
  • (2 3) (5 15) (5 goes to stack 3)
  • (2 3) (13 5 15) (13 goes to stack 3)
  • now stack 3 to is eaten because it has size = 3 (therefore 13 , 5 , 15 is eaten at step 8)
  • .
  • .
  • . -and so on..

can anyone tell me where i am getting it wrong? thanks in advance

Полный текст и комментарии »

  • Проголосовать: нравится
  • +2
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

Even after a lot of Practice why do I perform well in virtual contest but fail to perform well in real contests? . How can I overcome the fear of rating fall , and learn to perform well under pressure ? Does anyone have any suggestions that would help?

I had been giving some virtual contests , and i see i mostly perform well in them , and get a good rank , but in actual contest i almost always end up performing bad , even though i know i could have easily solved those questions if i was not under pressure .

Any suggestions for overcomming this would be really helpful :>

Полный текст и комментарии »

  • Проголосовать: нравится
  • +4
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

I get this when i submit code —

but when i run my exact same testcase in test against custom input window i get —

which is the correct output.

here is my submission — Submission

what am i doing wrong?

Полный текст и комментарии »

  • Проголосовать: нравится
  • -18
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

can anyone tell me how to make a custom hash for an —

unordered_set< pair < unordered_set<vector< int > , hashFunction > , vector< int > > >

where hashFunction is a custom hashFunction for unordered_set<vector> .

here it is —

struct hashFunction //hash function for unordered set

{

    size_t operator()(const vector<int> &myVector) const

    {
        std::hash<int> hasher;
        size_t answer = 0;

        for (int i : myVector)
        {
            answer ^= hasher(i) + 0x9e3779b9 +
                      (answer << 6) + (answer >> 2);
        }
        return answer;
    }
};

Полный текст и комментарии »

  • Проголосовать: нравится
  • -9
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

Given a weighted undirected graph , a source vertex(s) and x distinct vertices as Delivery Location (d1 , d2 , d3 .. dx) assuming that each rider travels at 1m/s what is the min time required to complete all the deliveries if you have M drivers originating from the source simultainously , and also the path followed by them.

( it is guarented that reaching all delivery locations from the source vertex is possible )

input format -

n m s M                            //n vertices , m edges , s is the source vertex , M no of drivers

v11 v12 w1                         //vertices connected with weight w

v21 v22 w2

..

vm1 vm2 wm

x                            // no of delivery locations 1<= x < n

d1 , d2 , d3 , d4 .... dx               //delivery vertices (d[i] != s for all 1 <= i <= x)

output format - (M+1 lines first line should contain the min time required and all others M lines should include paths followed by drivers from 1 to M)

t               //min time to complete all deliveries

1->3->5               // path followed by first delivery guy

1->2->1->5               // path followed by second delivery guy

1->               path followed by third delivery guy (it may be possible that this delivery person didn't moved so his path finished at source only)

(all paths should start from source only)

Полный текст и комментарии »

  • Проголосовать: нравится
  • +21
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

given a NxM integer matrix and i1,j1,i2,j2

such that i1 < i2 and j1 < j2

tell in O(1) that whether all elements in the rectangular submatrix formed by (i1 , j1) , (i1 , j2) , (i2 , j1) , (i2 , j2)

contains all same elements or not .

example —

input —

6

0 1 1 1 0 1

4 4 4 4 1 0

1 2 2 4 2 4

1 1 2 2 2 4

4 4 4 4 2 4

4 4 4 4 4 0

4 0 5 3 // zero based indexing , i1 , j1 , i2 , j2

output — yes

Полный текст и комментарии »

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

Given a Binary String and a number x , tell whether there exists a continuous substring with ( no of zeros — no of ones ) = x output its index l to r (0 based indexing) eg —

string = 1 1 0 0 1 0 0 1 0 0 1 0 1 1

x = 3

output — (you can output any possible l to r)

YES , from index 2 to index 6

OR

YES , from index 2 to index 8

Полный текст и комментарии »

  • Проголосовать: нравится
  • +5
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

Can anyone help me figure out what is wrong in my Code , the poblem is pretty straight forward and i Implemented it using PBDS , but i am getting WA in 5 added Testcases . Here is my submission Link — Submission

Полный текст и комментарии »

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 4 года назад, По-английски

i recently encountered a problem — 1208B - Uniqueness

I have use a simple Binary search approach for that problem

I have 2 submissions , both are using Binary search and both have same worst case time complexity of O(N^2 log^2(N)) but one gives TLE and the other gets AC

AC submission — 146479295

TLE Submission — 146333083

can anyone tell me why this is happening?

Полный текст и комментарии »

  • Проголосовать: нравится
  • -19
  • Проголосовать: не нравится

Автор Saksham_Sahgal, история, 5 лет назад, По-английски

Link to my submission

My approach was to map characters 'a' with the first character in the string then ,'b' with the second character in the string .. and so on , then while taking input of the N strings I converted those string(X) to a mapped string(Y) (each character conveted to its mapped value) and stored that mapped string(Y) in a multiset key with vaue as the original string(X), i did this as multiset will sort the keys lexographically and then i just printed the values of the map,

the first two given testcase gives right ans , but i dont understand why all other testcases give WA.

ps. i also solved it with custom comparator function which got AC(As expected) , but i don't understand why this approach gives me WA

Полный текст и комментарии »

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится